[ironic][xena] problems updating redfish_password for existing node
Wade Albright
the.wade.albright at gmail.com
Wed Aug 24 00:02:46 UTC 2022
I tested out the latest change posted here
https://review.opendev.org/c/openstack/sushy/+/853209
This solved the issue, things work fine now with session auth. Thanks for
that! Much appreciated.
Wade
On Tue, Aug 16, 2022 at 9:22 AM Wade Albright <the.wade.albright at gmail.com>
wrote:
> Thanks Julia! When I get a chance I will test that out and report back. I
> may not be able to get to it right away as the system where I could
> reproduce this issue reliably is in use by another team now.
>
>
>
> On Mon, Aug 15, 2022 at 5:27 PM Julia Kreger <juliaashleykreger at gmail.com>
> wrote:
>
>> Well, that is weird. If I grok it as-is, it almost looks like the BMC
>> returned an empty response.... It is not a failure mode we've seen or
>> had reported (afaik) up to this point.
>>
>> That being said, we should be able to invalidate the session and
>> launch a new client...
>>
>> I suspect https://review.opendev.org/c/openstack/sushy/+/853209
>> should fix things up. I suspect we will look at just invalidating the
>> session upon any error.
>>
>> On Mon, Aug 15, 2022 at 11:47 AM Wade Albright
>> <the.wade.albright at gmail.com> wrote:
>> >
>> > 1) There are sockets open briefly when the conductor is trying to
>> connect. After three tries the node is set to maintenance mode and there
>> are no more sockets open.
>> > 2) My (extremely simple) code was not using connection: close. I was
>> just running "requests.get("
>> https://10.12.104.174/redfish/v1/Systems/System.Embedded.1",
>> verify=False, auth=('xxxx', 'xxxx'))" in a loop. I just tried it with
>> headers={'Connection':'close'} and it doesn't seem to make any difference.
>> Works fine either way.
>> >
>> > I was able to confirm that the problem only happens when using session
>> auth. With basic auth it doesn't happen.
>> >
>> > Versions I'm using here are ironic 18.2.1 and sushy 3.12.2.
>> >
>> > Here are some fresh logs from the node having the problem:
>> >
>> > 2022-08-15 10:34:21.726 208875 INFO ironic.conductor.task_manager
>> [req-a66441d0-45a6-4e27-96d9-f26bf9c83725 fcba8c03bfd649cfbb39a907426b3338
>> b679510ddb6540ca9454e26841f65c89 - default default] Node
>> 0c304cea-8ae2-4a12-b658-dec05c190f88 moved to provision state "deploying"
>> from state "active"; target provision state is "active"
>> > 2022-08-15 10:34:22.553 208875 INFO ironic.conductor.utils
>> [req-a66441d0-45a6-4e27-96d9-f26bf9c83725 fcba8c03bfd649cfbb39a907426b3338
>> b679510ddb6540ca9454e26841f65c89 - default default] Node
>> 0c304cea-8ae2-4a12-b658-dec05c190f88 current power state is 'power on',
>> requested state is 'power off'.
>> > 2022-08-15 10:34:35.185 208875 INFO ironic.conductor.utils
>> [req-a66441d0-45a6-4e27-96d9-f26bf9c83725 fcba8c03bfd649cfbb39a907426b3338
>> b679510ddb6540ca9454e26841f65c89 - default default] Successfully set node
>> 0c304cea-8ae2-4a12-b658-dec05c190f88 power state to power off by power off.
>> > 2022-08-15 10:34:35.200 208875 WARNING ironic.common.pxe_utils
>> [req-a66441d0-45a6-4e27-96d9-f26bf9c83725 fcba8c03bfd649cfbb39a907426b3338
>> b679510ddb6540ca9454e26841f65c89 - default default] IPv6 is enabled and the
>> DHCP driver appears set to a plugin aside from "neutron". Node
>> 0c304cea-8ae2-4a12-b658-dec05c190f88 may not receive proper DHCPv6 provided
>> boot parameters.
>> > 2022-08-15 10:34:38.246 208875 INFO ironic.conductor.deployments
>> [req-a66441d0-45a6-4e27-96d9-f26bf9c83725 fcba8c03bfd649cfbb39a907426b3338
>> b679510ddb6540ca9454e26841f65c89 - default default] Deploying on node
>> 0c304cea-8ae2-4a12-b658-dec05c190f88, remaining steps: [{'step':
>> 'pre_deploy', 'priority': 200, 'argsinfo': None, 'interface': 'deploy'},
>> {'step': 'deploy', 'priority': 100, 'argsinfo': None, 'interface':
>> 'deploy'}, {'step': 'write_image', 'priority': 80, 'argsinfo': None,
>> 'interface': 'deploy'}, {'step': 'tear_down_agent', 'priority': 40,
>> 'argsinfo': None, 'interface': 'deploy'}, {'step':
>> 'switch_to_tenant_network', 'priority': 30, 'argsinfo': None, 'interface':
>> 'deploy'}, {'step': 'boot_instance', 'priority': 20, 'argsinfo': None,
>> 'interface': 'deploy'}]
>> > 2022-08-15 10:34:38.255 208875 INFO ironic.conductor.deployments
>> [req-a66441d0-45a6-4e27-96d9-f26bf9c83725 fcba8c03bfd649cfbb39a907426b3338
>> b679510ddb6540ca9454e26841f65c89 - default default] Executing {'step':
>> 'pre_deploy', 'priority': 200, 'argsinfo': None, 'interface': 'deploy'} on
>> node 0c304cea-8ae2-4a12-b658-dec05c190f88
>> > 2022-08-15 10:35:27.158 208875 INFO
>> ironic.drivers.modules.ansible.deploy
>> [req-a66441d0-45a6-4e27-96d9-f26bf9c83725 fcba8c03bfd649cfbb39a907426b3338
>> b679510ddb6540ca9454e26841f65c89 - default default] Ansible pre-deploy step
>> complete on node 0c304cea-8ae2-4a12-b658-dec05c190f88
>> > 2022-08-15 10:35:27.159 208875 INFO ironic.conductor.deployments
>> [req-a66441d0-45a6-4e27-96d9-f26bf9c83725 fcba8c03bfd649cfbb39a907426b3338
>> b679510ddb6540ca9454e26841f65c89 - default default] Node
>> 0c304cea-8ae2-4a12-b658-dec05c190f88 finished deploy step {'step':
>> 'pre_deploy', 'priority': 200, 'argsinfo': None, 'interface': 'deploy'}
>> > 2022-08-15 10:35:27.160 208875 INFO ironic.conductor.deployments
>> [req-a66441d0-45a6-4e27-96d9-f26bf9c83725 fcba8c03bfd649cfbb39a907426b3338
>> b679510ddb6540ca9454e26841f65c89 - default default] Deploying on node
>> 0c304cea-8ae2-4a12-b658-dec05c190f88, remaining steps: [{'step': 'deploy',
>> 'priority': 100, 'argsinfo': None, 'interface': 'deploy'}, {'step':
>> 'write_image', 'priority': 80, 'argsinfo': None, 'interface': 'deploy'},
>> {'step': 'tear_down_agent', 'priority': 40, 'argsinfo': None, 'interface':
>> 'deploy'}, {'step': 'switch_to_tenant_network', 'priority': 30, 'argsinfo':
>> None, 'interface': 'deploy'}, {'step': 'boot_instance', 'priority': 20,
>> 'argsinfo': None, 'interface': 'deploy'}]
>> > 2022-08-15 10:35:27.176 208875 INFO ironic.conductor.deployments
>> [req-a66441d0-45a6-4e27-96d9-f26bf9c83725 fcba8c03bfd649cfbb39a907426b3338
>> b679510ddb6540ca9454e26841f65c89 - default default] Executing {'step':
>> 'deploy', 'priority': 100, 'argsinfo': None, 'interface': 'deploy'} on node
>> 0c304cea-8ae2-4a12-b658-dec05c190f88
>> > 2022-08-15 10:35:32.037 208875 INFO ironic.conductor.utils
>> [req-a66441d0-45a6-4e27-96d9-f26bf9c83725 fcba8c03bfd649cfbb39a907426b3338
>> b679510ddb6540ca9454e26841f65c89 - default default] Successfully set node
>> 0c304cea-8ae2-4a12-b658-dec05c190f88 power state to power on by rebooting.
>> > 2022-08-15 10:35:32.037 208875 INFO ironic.conductor.deployments
>> [req-a66441d0-45a6-4e27-96d9-f26bf9c83725 fcba8c03bfd649cfbb39a907426b3338
>> b679510ddb6540ca9454e26841f65c89 - default default] Deploy step {'step':
>> 'deploy', 'priority': 100, 'argsinfo': None, 'interface': 'deploy'} on node
>> 0c304cea-8ae2-4a12-b658-dec05c190f88 being executed asynchronously, waiting
>> for driver.
>> > 2022-08-15 10:35:32.051 208875 INFO ironic.conductor.task_manager
>> [req-a66441d0-45a6-4e27-96d9-f26bf9c83725 fcba8c03bfd649cfbb39a907426b3338
>> b679510ddb6540ca9454e26841f65c89 - default default] Node
>> 0c304cea-8ae2-4a12-b658-dec05c190f88 moved to provision state "wait
>> call-back" from state "deploying"; target provision state is "active"
>> > 2022-08-15 10:39:54.726 208875 INFO ironic.conductor.task_manager
>> [req-fdd7319b-85b5-4adb-9773-ff64fd1423b2 - - - - -] Node
>> 0c304cea-8ae2-4a12-b658-dec05c190f88 moved to provision state "deploying"
>> from state "wait call-back"; target provision state is "active"
>> > 2022-08-15 10:39:54.741 208875 INFO ironic.conductor.deployments
>> [req-fdd7319b-85b5-4adb-9773-ff64fd1423b2 - - - - -] Deploying on node
>> 0c304cea-8ae2-4a12-b658-dec05c190f88, remaining steps: [{'step':
>> 'write_image', 'priority': 80, 'argsinfo': None, 'interface': 'deploy'},
>> {'step': 'tear_down_agent', 'priority': 40, 'argsinfo': None, 'interface':
>> 'deploy'}, {'step': 'switch_to_tenant_network', 'priority': 30, 'argsinfo':
>> None, 'interface': 'deploy'}, {'step': 'boot_instance', 'priority': 20,
>> 'argsinfo': None, 'interface': 'deploy'}]
>> > 2022-08-15 10:39:54.748 208875 INFO ironic.conductor.deployments
>> [req-fdd7319b-85b5-4adb-9773-ff64fd1423b2 - - - - -] Executing {'step':
>> 'write_image', 'priority': 80, 'argsinfo': None, 'interface': 'deploy'} on
>> node 0c304cea-8ae2-4a12-b658-dec05c190f88
>> > 2022-08-15 10:42:24.738 208875 WARNING
>> ironic.drivers.modules.agent_base [-] Node
>> 0c304cea-8ae2-4a12-b658-dec05c190f88 is currently locked, skipping
>> heartbeat processing (will retry on the next heartbeat):
>> ironic.common.exception.NodeLocked: Node
>> 0c304cea-8ae2-4a12-b658-dec05c190f88 is locked by host
>> sjc06-c01-irn01.ops.ringcentral.com, please retry after the current
>> operation is completed.
>> > 2022-08-15 10:44:29.788 208875 WARNING
>> ironic.drivers.modules.agent_base [-] Node
>> 0c304cea-8ae2-4a12-b658-dec05c190f88 is currently locked, skipping
>> heartbeat processing (will retry on the next heartbeat):
>> ironic.common.exception.NodeLocked: Node
>> 0c304cea-8ae2-4a12-b658-dec05c190f88 is locked by host
>> sjc06-c01-irn01.ops.ringcentral.com, please retry after the current
>> operation is completed.
>> > 2022-08-15 10:47:24.830 208875 WARNING
>> ironic.drivers.modules.agent_base [-] Node
>> 0c304cea-8ae2-4a12-b658-dec05c190f88 is currently locked, skipping
>> heartbeat processing (will retry on the next heartbeat):
>> ironic.common.exception.NodeLocked: Node
>> 0c304cea-8ae2-4a12-b658-dec05c190f88 is locked by host
>> sjc06-c01-irn01.ops.ringcentral.com, please retry after the current
>> operation is completed.
>> > 2022-08-15 11:05:59.544 208875 INFO
>> ironic.drivers.modules.ansible.deploy
>> [req-fdd7319b-85b5-4adb-9773-ff64fd1423b2 - - - - -] Ansible complete
>> deploy on node 0c304cea-8ae2-4a12-b658-dec05c190f88
>> > 2022-08-15 11:06:00.141 208875 ERROR ironic.conductor.utils
>> [req-fdd7319b-85b5-4adb-9773-ff64fd1423b2 - - - - -] Node
>> 0c304cea-8ae2-4a12-b658-dec05c190f88 failed deploy step {'step':
>> 'write_image', 'priority': 80, 'argsinfo': None, 'interface': 'deploy'}
>> with unexpected error: ("Connection broken: InvalidChunkLength(got length
>> b'', 0 bytes read)", InvalidChunkLength(got length b'', 0 bytes read)):
>> requests.exceptions.ChunkedEncodingError: ("Connection broken:
>> InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got
>> length b'', 0 bytes read))
>> > 2022-08-15 11:06:00.218 208875 ERROR ironic.conductor.task_manager
>> [req-fdd7319b-85b5-4adb-9773-ff64fd1423b2 - - - - -] Node
>> 0c304cea-8ae2-4a12-b658-dec05c190f88 moved to provision state "deploy
>> failed" from state "deploying"; target provision state is "active":
>> requests.exceptions.ChunkedEncodingError: ("Connection broken:
>> InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got
>> length b'', 0 bytes read))
>> > 2022-08-15 11:06:28.774 208875 WARNING ironic.conductor.manager
>> [req-dd0abde9-3364-4828-8907-b42fe4cc9c66 - - - - -] During
>> sync_power_state, could not get power state for node
>> 0c304cea-8ae2-4a12-b658-dec05c190f88, attempt 1 of 3. Error: ("Connection
>> broken: InvalidChunkLength(got length b'', 0 bytes read)",
>> InvalidChunkLength(got length b'', 0 bytes read)).:
>> requests.exceptions.ChunkedEncodingError: ("Connection broken:
>> InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got
>> length b'', 0 bytes read))
>> > 2022-08-15 11:06:53.710 208875 WARNING ironic.conductor.manager
>> [req-dd0abde9-3364-4828-8907-b42fe4cc9c66 - - - - -] During
>> sync_power_state, could not get power state for node
>> 0c304cea-8ae2-4a12-b658-dec05c190f88, attempt 2 of 3. Error: ("Connection
>> broken: InvalidChunkLength(got length b'', 0 bytes read)",
>> InvalidChunkLength(got length b'', 0 bytes read)).:
>> requests.exceptions.ChunkedEncodingError: ("Connection broken:
>> InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got
>> length b'', 0 bytes read))
>> > 2022-08-15 11:07:53.727 208875 WARNING ironic.conductor.manager
>> [req-dd0abde9-3364-4828-8907-b42fe4cc9c66 - - - - -] During
>> sync_power_state, could not get power state for node
>> 0c304cea-8ae2-4a12-b658-dec05c190f88, attempt 3 of 3. Error: ("Connection
>> broken: InvalidChunkLength(got length b'', 0 bytes read)",
>> InvalidChunkLength(got length b'', 0 bytes read)).:
>> requests.exceptions.ChunkedEncodingError: ("Connection broken:
>> InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got
>> length b'', 0 bytes read))
>> > 2022-08-15 11:08:53.704 208875 ERROR ironic.conductor.manager
>> [req-dd0abde9-3364-4828-8907-b42fe4cc9c66 - - - - -] During
>> sync_power_state, max retries exceeded for node
>> 0c304cea-8ae2-4a12-b658-dec05c190f88, node state None does not match
>> expected state 'power on'. Updating DB state to 'None' Switching node to
>> maintenance mode. Error: ("Connection broken: InvalidChunkLength(got length
>> b'', 0 bytes read)", InvalidChunkLength(got length b'', 0 bytes read)):
>> requests.exceptions.ChunkedEncodingError: ("Connection broken:
>> InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got
>> length b'', 0 bytes read))
>> > 2022-08-15 11:13:53.750 208875 ERROR ironic.conductor.manager
>> [req-dd0abde9-3364-4828-8907-b42fe4cc9c66 - - - - -] During
>> sync_power_state, max retries exceeded for node
>> 0c304cea-8ae2-4a12-b658-dec05c190f88, node state None does not match
>> expected state 'None'. Updating DB state to 'None' Switching node to
>> maintenance mode. Error: ("Connection broken: InvalidChunkLength(got length
>> b'', 0 bytes read)", InvalidChunkLength(got length b'', 0 bytes read)):
>> requests.exceptions.ChunkedEncodingError: ("Connection broken:
>> InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got
>> length b'', 0 bytes read))
>> >
>> >
>> > On Fri, Aug 12, 2022 at 9:06 PM Julia Kreger <
>> juliaashleykreger at gmail.com> wrote:
>> >>
>> >> Two questions:
>> >>
>> >> 1) do you see open sockets to the BMCs in netstat output?
>> >> 2) is your code using ‘connection: close’? Or are you using sushy?
>> >>
>> >> Honestly, this seems *really* weird with current sushy versions, and
>> is kind of reminiscent of a cached session which is using kept alive
>> sockets.
>> >>
>> >> If you could grep out req-b6dd74da-1cc7-4c63-b58e-b7ded37007e9 to see
>> what the prior couple of conductor actions were, that would give us better
>> context as to what is going on.
>> >>
>> >> -Julia
>> >>
>> >> On Fri, Aug 12, 2022 at 3:11 PM Wade Albright <
>> the.wade.albright at gmail.com> wrote:
>> >>>
>> >>> Sorry for the spam. The openssl issue may have been a red herring. I
>> am not able to reproduce the issue directly with my own python code. I was
>> trying to fetch something that required authentication. After I added the
>> correct auth info it works fine. I am not able to cause the same error as
>> is happening in the Ironic logs.
>> >>>
>> >>> Anyway I'll do some more testing and report back.
>> >>>
>> >>> On Fri, Aug 12, 2022 at 2:14 PM Wade Albright <
>> the.wade.albright at gmail.com> wrote:
>> >>>>
>> >>>> I'm not sure why this problem only now started showing up, but it
>> appears to be unrelated to Ironic. I was able to reproduce it directly
>> outside of Ironic using a simple python program using urllib to get URLs
>> from the BMC/redfish interface. Seems to be some combination of a buggy
>> server SSL implementation and newer openssl 1.1.1. Apparently it doesn't
>> happen using openssl 1.0.
>> >>>>
>> >>>> I've found some information about possible workarounds but haven't
>> figured it out yet. If I do I'll update this thread just in case anyone
>> else runs into it.
>> >>>>
>> >>>> On Fri, Aug 12, 2022 at 8:13 AM Wade Albright <
>> the.wade.albright at gmail.com> wrote:
>> >>>>>
>> >>>>> So I seem to have run into a new issue after upgrading to the newer
>> versions to fix the password change issue.
>> >>>>>
>> >>>>> Now I am randomly getting errors like the below. Once I hit this
>> error for a given node, no operations work on the node. I thought maybe it
>> was an issue with the node itself, but it doesn't seem like it. The BMC
>> seems to be working fine.
>> >>>>>
>> >>>>> After a conductor restart, things start working again. Has anyone
>> seen something like this?
>> >>>>>
>> >>>>> Log example:
>> >>>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils [- - -
>> - -] Node ef5a2502-680b-4933-a0ee-6737e57ce1c5 failed deploy step {'step':
>> 'write_image', 'priority':
>> >>>>> 80, 'argsinfo': None, 'interface': 'deploy'} with unexpected error:
>> ("Connection broken: InvalidChunkLength(got length b'', 0 bytes read)",
>> InvalidChunkLength(got length b'', 0 bytes read)): requests.exceptions.
>> >>>>> ChunkedEncodingError: ("Connection broken: InvalidChunkLength(got
>> length b'', 0 bytes read)", InvalidChunkLength(got length b'', 0 bytes
>> read))
>> >>>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils
>> Traceback (most recent call last):
>> >>>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils File
>> "/usr/local/lib/python3.6/site-packages/urllib3/response.py", line 697, in
>> _update_chunk_length
>> >>>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils
>> self.chunk_left = int(line, 16)
>> >>>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils
>> ValueError: invalid literal for int() with base 16: b''
>> >>>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils
>> >>>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils During
>> handling of the above exception, another exception occurred:
>> >>>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils
>> >>>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils
>> Traceback (most recent call last):
>> >>>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils File
>> "/usr/local/lib/python3.6/site-packages/urllib3/response.py", line 438, in
>> _error_catcher
>> >>>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils
>> yield
>> >>>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils File
>> "/usr/local/lib/python3.6/site-packages/urllib3/response.py", line 764, in
>> read_chunked
>> >>>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils
>> self._update_chunk_length()
>> >>>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils File
>> "/usr/local/lib/python3.6/site-packages/urllib3/response.py", line 701, in
>> _update_chunk_length
>> >>>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils
>> raise InvalidChunkLength(self, line)
>> >>>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils
>> urllib3.exceptions.InvalidChunkLength: InvalidChunkLength(got length b'', 0
>> bytes read)
>> >>>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils
>> >>>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils During
>> handling of the above exception, another exception occurred:
>> >>>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils
>> >>>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils
>> Traceback (most recent call last):
>> >>>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils File
>> "/usr/local/lib/python3.6/site-packages/requests/models.py", line 760, in
>> generate
>> >>>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils
>> for chunk in self.raw.stream(chunk_size, decode_content=True):
>> >>>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils File
>> "/usr/local/lib/python3.6/site-packages/urllib3/response.py", line 572, in
>> stream
>> >>>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils
>> for line in self.read_chunked(amt, decode_content=decode_content):
>> >>>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils File
>> "/usr/local/lib/python3.6/site-packages/urllib3/response.py", line 793, in
>> read_chunked
>> >>>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils
>> self._original_response.close()
>> >>>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils File
>> "/usr/lib64/python3.6/contextlib.py", line 99, in __exit__
>> >>>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils
>> self.gen.throw(type, value, traceback)
>> >>>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils File
>> "/usr/local/lib/python3.6/site-packages/urllib3/response.py", line 455, in
>> _error_catcher
>> >>>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils
>> raise ProtocolError("Connection broken: %r" % e, e)
>> >>>>> 2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils
>> urllib3.exceptions.ProtocolError: ("Connection broken:
>> InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got
>> length b'', 0 bytes r
>> >>>>> ead))
>> >>>>>
>> >>>>> On Wed, Jul 20, 2022 at 2:04 PM Wade Albright <
>> the.wade.albright at gmail.com> wrote:
>> >>>>>>
>> >>>>>> I forgot to mention, that using session auth solved the problem
>> after upgrading to the newer versions that include the two mentioned
>> patches.
>> >>>>>>
>> >>>>>> On Wed, Jul 20, 2022 at 7:36 AM Wade Albright <
>> the.wade.albright at gmail.com> wrote:
>> >>>>>>>
>> >>>>>>> Switching to session auth solved the problem, and it seems like
>> the better way to go anyway for equipment that supports it. Thanks again
>> for all your help!
>> >>>>>>>
>> >>>>>>> Wade
>> >>>>>>>
>> >>>>>>> On Tue, Jul 19, 2022 at 5:37 PM Julia Kreger <
>> juliaashleykreger at gmail.com> wrote:
>> >>>>>>>>
>> >>>>>>>> Just to provide a brief update for the mailing list. It looks
>> like
>> >>>>>>>> this is a case of use of Basic Auth with the BMC, where we were
>> not
>> >>>>>>>> catching the error properly... and thus not reporting the
>> >>>>>>>> authentication failure to ironic so it would catch, and initiate
>> a new
>> >>>>>>>> client with the most up to date password. The default, typically
>> used
>> >>>>>>>> path is Session based authentication as BMCs generally handle
>> internal
>> >>>>>>>> session/user login tracking in a far better fashion. But not
>> every BMC
>> >>>>>>>> supports sessions.
>> >>>>>>>>
>> >>>>>>>> Fix in review[0] :)
>> >>>>>>>>
>> >>>>>>>> -Julia
>> >>>>>>>> [0] https://review.opendev.org/c/openstack/sushy/+/850425
>> >>>>>>>>
>> >>>>>>>> On Mon, Jul 18, 2022 at 4:15 PM Julia Kreger
>> >>>>>>>> <juliaashleykreger at gmail.com> wrote:
>> >>>>>>>> >
>> >>>>>>>> > Excellent, hopefully I'll be able to figure out why Sushy is
>> not doing
>> >>>>>>>> > the needful... Or if it is and Ironic is not picking up on it.
>> >>>>>>>> >
>> >>>>>>>> > Anyway, I've posted
>> >>>>>>>> > https://review.opendev.org/c/openstack/ironic/+/850259 which
>> might
>> >>>>>>>> > handle this issue. Obviously a work in progress, but it
>> represents
>> >>>>>>>> > what I think is happening inside of ironic itself leading into
>> sushy
>> >>>>>>>> > when cache access occurs.
>> >>>>>>>> >
>> >>>>>>>> > On Mon, Jul 18, 2022 at 4:04 PM Wade Albright
>> >>>>>>>> > <the.wade.albright at gmail.com> wrote:
>> >>>>>>>> > >
>> >>>>>>>> > > Sounds good, I will do that tomorrow. Thanks Julia.
>> >>>>>>>> > >
>> >>>>>>>> > > On Mon, Jul 18, 2022 at 3:27 PM Julia Kreger <
>> juliaashleykreger at gmail.com> wrote:
>> >>>>>>>> > >>
>> >>>>>>>> > >> Debug would be best. I think I have an idea what is going
>> on, and this
>> >>>>>>>> > >> is a similar variation. If you want, you can email them
>> directly to
>> >>>>>>>> > >> me. Specifically only need entries reported by the sushy
>> library and
>> >>>>>>>> > >> ironic.drivers.modules.redfish.utils.
>> >>>>>>>> > >>
>> >>>>>>>> > >> On Mon, Jul 18, 2022 at 3:20 PM Wade Albright
>> >>>>>>>> > >> <the.wade.albright at gmail.com> wrote:
>> >>>>>>>> > >> >
>> >>>>>>>> > >> > I'm happy to supply some logs, what verbosity level
>> should i use? And should I just embed the logs in email to the list or
>> upload somewhere?
>> >>>>>>>> > >> >
>> >>>>>>>> > >> > On Mon, Jul 18, 2022 at 3:14 PM Julia Kreger <
>> juliaashleykreger at gmail.com> wrote:
>> >>>>>>>> > >> >>
>> >>>>>>>> > >> >> If you could supply some conductor logs, that would be
>> helpful. It
>> >>>>>>>> > >> >> should be re-authenticating, but obviously we have a
>> larger bug there
>> >>>>>>>> > >> >> we need to find the root issue behind.
>> >>>>>>>> > >> >>
>> >>>>>>>> > >> >> On Mon, Jul 18, 2022 at 3:06 PM Wade Albright
>> >>>>>>>> > >> >> <the.wade.albright at gmail.com> wrote:
>> >>>>>>>> > >> >> >
>> >>>>>>>> > >> >> > I was able to use the patches to update the code, but
>> unfortunately the problem is still there for me.
>> >>>>>>>> > >> >> >
>> >>>>>>>> > >> >> > I also tried an RPM upgrade to the versions Julia
>> mentioned had the fixes, namely Sushy 3.12.1 - Released May 2022 and Ironic
>> 18.2.1 - Released in January 2022. But it did not fix the problem.
>> >>>>>>>> > >> >> >
>> >>>>>>>> > >> >> > I am able to consistently reproduce the error.
>> >>>>>>>> > >> >> > - step 1: change BMC password directly on the node
>> itself
>> >>>>>>>> > >> >> > - step 2: update BMC password (redfish_password) in
>> ironic with 'openstack baremetal node set <nodename> --driver-info
>> redfish_password='newpass'
>> >>>>>>>> > >> >> >
>> >>>>>>>> > >> >> > After step 1 there are errors in the logs entries like
>> "Session authentication appears to have been lost at some point in time"
>> and eventually it puts the node into maintenance mode and marks the power
>> state as "none."
>> >>>>>>>> > >> >> > After step 2 and taking the host back out of
>> maintenance mode, it goes through a similar set of log entries puts the
>> node into MM again.
>> >>>>>>>> > >> >> >
>> >>>>>>>> > >> >> > After the above steps, a conductor restart fixes the
>> problem and operations work normally again. Given this it seems like there
>> is still some kind of caching issue.
>> >>>>>>>> > >> >> >
>> >>>>>>>> > >> >> > On Sat, Jul 16, 2022 at 6:01 PM Wade Albright <
>> the.wade.albright at gmail.com> wrote:
>> >>>>>>>> > >> >> >>
>> >>>>>>>> > >> >> >> Hi Julia,
>> >>>>>>>> > >> >> >>
>> >>>>>>>> > >> >> >> Thank you so much for the reply! Hopefully this is
>> the issue. I'll try out the patches next week and report back. I'll also
>> email you on Monday about the versions, that would be very helpful to know.
>> >>>>>>>> > >> >> >>
>> >>>>>>>> > >> >> >> Thanks again, really appreciate it.
>> >>>>>>>> > >> >> >>
>> >>>>>>>> > >> >> >> Wade
>> >>>>>>>> > >> >> >>
>> >>>>>>>> > >> >> >>
>> >>>>>>>> > >> >> >>
>> >>>>>>>> > >> >> >> On Sat, Jul 16, 2022 at 4:36 PM Julia Kreger <
>> juliaashleykreger at gmail.com> wrote:
>> >>>>>>>> > >> >> >>>
>> >>>>>>>> > >> >> >>> Greetings!
>> >>>>>>>> > >> >> >>>
>> >>>>>>>> > >> >> >>> I believe you need two patches, one in ironic and
>> one in sushy.
>> >>>>>>>> > >> >> >>>
>> >>>>>>>> > >> >> >>> Sushy:
>> >>>>>>>> > >> >> >>>
>> https://review.opendev.org/c/openstack/sushy/+/832860
>> >>>>>>>> > >> >> >>>
>> >>>>>>>> > >> >> >>> Ironic:
>> >>>>>>>> > >> >> >>>
>> https://review.opendev.org/c/openstack/ironic/+/820588
>> >>>>>>>> > >> >> >>>
>> >>>>>>>> > >> >> >>> I think it is variation, and the comment about
>> working after you restart the conductor is the big signal to me. I’m on a
>> phone on a bad data connection, if you email me on Monday I can see what
>> versions the fixes would be in.
>> >>>>>>>> > >> >> >>>
>> >>>>>>>> > >> >> >>> For the record, it is a session cache issue, the bug
>> was that the service didn’t quite know what to do when auth fails.
>> >>>>>>>> > >> >> >>>
>> >>>>>>>> > >> >> >>> -Julia
>> >>>>>>>> > >> >> >>>
>> >>>>>>>> > >> >> >>>
>> >>>>>>>> > >> >> >>> On Fri, Jul 15, 2022 at 2:55 PM Wade Albright <
>> the.wade.albright at gmail.com> wrote:
>> >>>>>>>> > >> >> >>>>
>> >>>>>>>> > >> >> >>>> Hi,
>> >>>>>>>> > >> >> >>>>
>> >>>>>>>> > >> >> >>>> I'm hitting a problem when trying to update the
>> redfish_password for an existing node. I'm curious to know if anyone else
>> has encountered this problem. I'm not sure if I'm just doing something
>> wrong or if there is a bug. Or if the problem is unique to my setup.
>> >>>>>>>> > >> >> >>>>
>> >>>>>>>> > >> >> >>>> I have a node already added into ironic with all
>> the driver details set, and things are working fine. I am able to run
>> deployments.
>> >>>>>>>> > >> >> >>>>
>> >>>>>>>> > >> >> >>>> Now I need to change the redfish password on the
>> host. So I update the password for redfish access on the host, then use an
>> 'openstack baremetal node set <node> --driver-info
>> redfish_password=<newpass>' command to set the new redfish_password.
>> >>>>>>>> > >> >> >>>>
>> >>>>>>>> > >> >> >>>> Once this has been done, deployment no longer
>> works. I see redfish authentication errors in the logs and the operation
>> fails. I waited a bit to see if there might just be a delay in updating the
>> password, but after awhile it still didn't work.
>> >>>>>>>> > >> >> >>>>
>> >>>>>>>> > >> >> >>>> I restarted the conductor, and after that things
>> work fine again. So it seems like the password is cached or something. Is
>> there a way to force the password to update? I even tried removing the
>> redfish credentials and re-adding them, but that didn't work either. Only a
>> conductor restart seems to make the new password work.
>> >>>>>>>> > >> >> >>>>
>> >>>>>>>> > >> >> >>>> We are running Xena, using rpm installation on
>> Oracle Linux 8.5.
>> >>>>>>>> > >> >> >>>>
>> >>>>>>>> > >> >> >>>> Thanks in advance for any help with this issue.
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20220823/a3e1d1b8/attachment-0001.htm>
More information about the openstack-discuss
mailing list