[ironic][xena] problems updating redfish_password for existing node

Wade Albright the.wade.albright at gmail.com
Fri Aug 12 15:13:49 UTC 2022


So I seem to have run into a new issue after upgrading to the newer
versions to fix the password change issue.

Now I am randomly getting errors like the below. Once I hit this error for
a given node, no operations work on the node. I thought maybe it was an
issue with the node itself, but it doesn't seem like it. The BMC seems to
be working fine.

After a conductor restart, things start working again. Has anyone seen
something like this?

Log example:
2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils
[req-b6dd74da-1cc7-4c63-b58e-b7ded37007e9 - - - - -] Node
ef5a2502-680b-4933-a0ee-6737e57ce1c5 failed deploy step {'step':
'write_image', 'priority':
80, 'argsinfo': None, 'interface': 'deploy'} with unexpected error:
("Connection broken: InvalidChunkLength(got length b'', 0 bytes read)",
InvalidChunkLength(got length b'', 0 bytes read)): requests.exceptions.
ChunkedEncodingError: ("Connection broken: InvalidChunkLength(got length
b'', 0 bytes read)", InvalidChunkLength(got length b'', 0 bytes read))
2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils Traceback
(most recent call last):
2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils   File
"/usr/local/lib/python3.6/site-packages/urllib3/response.py", line 697, in
_update_chunk_length
2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils
self.chunk_left = int(line, 16)
2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils ValueError:
invalid literal for int() with base 16: b''
2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils
2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils During
handling of the above exception, another exception occurred:
2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils
2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils Traceback
(most recent call last):
2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils   File
"/usr/local/lib/python3.6/site-packages/urllib3/response.py", line 438, in
_error_catcher
2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils     yield
2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils   File
"/usr/local/lib/python3.6/site-packages/urllib3/response.py", line 764, in
read_chunked
2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils
self._update_chunk_length()
2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils   File
"/usr/local/lib/python3.6/site-packages/urllib3/response.py", line 701, in
_update_chunk_length
2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils     raise
InvalidChunkLength(self, line)
2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils
urllib3.exceptions.InvalidChunkLength: InvalidChunkLength(got length b'', 0
bytes read)
2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils
2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils During
handling of the above exception, another exception occurred:
2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils
2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils Traceback
(most recent call last):
2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils   File
"/usr/local/lib/python3.6/site-packages/requests/models.py", line 760, in
generate
2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils     for chunk
in self.raw.stream(chunk_size, decode_content=True):
2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils   File
"/usr/local/lib/python3.6/site-packages/urllib3/response.py", line 572, in
stream
2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils     for line
in self.read_chunked(amt, decode_content=decode_content):
2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils   File
"/usr/local/lib/python3.6/site-packages/urllib3/response.py", line 793, in
read_chunked
2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils
self._original_response.close()
2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils   File
"/usr/lib64/python3.6/contextlib.py", line 99, in __exit__
2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils
self.gen.throw(type, value, traceback)
2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils   File
"/usr/local/lib/python3.6/site-packages/urllib3/response.py", line 455, in
_error_catcher
2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils     raise
ProtocolError("Connection broken: %r" % e, e)
2022-08-12 07:45:33.227 1563371 ERROR ironic.conductor.utils
urllib3.exceptions.ProtocolError: ("Connection broken:
InvalidChunkLength(got length b'', 0 bytes read)", InvalidChunkLength(got
length b'', 0 bytes r
ead))

On Wed, Jul 20, 2022 at 2:04 PM Wade Albright <the.wade.albright at gmail.com>
wrote:

> I forgot to mention, that using session auth solved the problem after
> upgrading to the newer versions that include the two mentioned patches.
>
> On Wed, Jul 20, 2022 at 7:36 AM Wade Albright <the.wade.albright at gmail.com>
> wrote:
>
>> Switching to session auth solved the problem, and it seems like the
>> better way to go anyway for equipment that supports it. Thanks again for
>> all your help!
>>
>> Wade
>>
>> On Tue, Jul 19, 2022 at 5:37 PM Julia Kreger <juliaashleykreger at gmail.com>
>> wrote:
>>
>>> Just to provide a brief update for the mailing list. It looks like
>>> this is a case of use of Basic Auth with the BMC, where we were not
>>> catching the error properly... and thus not reporting the
>>> authentication failure to ironic so it would catch, and initiate a new
>>> client with the most up to date password. The default, typically used
>>> path is Session based authentication as BMCs generally handle internal
>>> session/user login tracking in a far better fashion. But not every BMC
>>> supports sessions.
>>>
>>> Fix in review[0] :)
>>>
>>> -Julia
>>> [0] https://review.opendev.org/c/openstack/sushy/+/850425
>>>
>>> On Mon, Jul 18, 2022 at 4:15 PM Julia Kreger
>>> <juliaashleykreger at gmail.com> wrote:
>>> >
>>> > Excellent, hopefully I'll be able to figure out why Sushy is not doing
>>> > the needful... Or if it is and Ironic is not picking up on it.
>>> >
>>> > Anyway, I've posted
>>> > https://review.opendev.org/c/openstack/ironic/+/850259 which might
>>> > handle this issue. Obviously a work in progress, but it represents
>>> > what I think is happening inside of ironic itself leading into sushy
>>> > when cache access occurs.
>>> >
>>> > On Mon, Jul 18, 2022 at 4:04 PM Wade Albright
>>> > <the.wade.albright at gmail.com> wrote:
>>> > >
>>> > > Sounds good, I will do that tomorrow. Thanks Julia.
>>> > >
>>> > > On Mon, Jul 18, 2022 at 3:27 PM Julia Kreger <
>>> juliaashleykreger at gmail.com> wrote:
>>> > >>
>>> > >> Debug would be best. I think I have an idea what is going on, and
>>> this
>>> > >> is a similar variation. If you want, you can email them directly to
>>> > >> me. Specifically only need entries reported by the sushy library and
>>> > >> ironic.drivers.modules.redfish.utils.
>>> > >>
>>> > >> On Mon, Jul 18, 2022 at 3:20 PM Wade Albright
>>> > >> <the.wade.albright at gmail.com> wrote:
>>> > >> >
>>> > >> > I'm happy to supply some logs, what verbosity level should i use?
>>> And should I just embed the logs in email to the list or upload somewhere?
>>> > >> >
>>> > >> > On Mon, Jul 18, 2022 at 3:14 PM Julia Kreger <
>>> juliaashleykreger at gmail.com> wrote:
>>> > >> >>
>>> > >> >> If you could supply some conductor logs, that would be helpful.
>>> It
>>> > >> >> should be re-authenticating, but obviously we have a larger bug
>>> there
>>> > >> >> we need to find the root issue behind.
>>> > >> >>
>>> > >> >> On Mon, Jul 18, 2022 at 3:06 PM Wade Albright
>>> > >> >> <the.wade.albright at gmail.com> wrote:
>>> > >> >> >
>>> > >> >> > I was able to use the patches to update the code, but
>>> unfortunately the problem is still there for me.
>>> > >> >> >
>>> > >> >> > I also tried an RPM upgrade to the versions Julia mentioned
>>> had the fixes, namely Sushy 3.12.1 - Released May 2022 and Ironic 18.2.1 -
>>> Released in January 2022. But it did not fix the problem.
>>> > >> >> >
>>> > >> >> > I am able to consistently reproduce the error.
>>> > >> >> >  - step 1: change BMC password directly on the node itself
>>> > >> >> >  - step 2: update BMC password (redfish_password) in ironic
>>> with 'openstack baremetal node set <nodename> --driver-info
>>> redfish_password='newpass'
>>> > >> >> >
>>> > >> >> > After step 1 there are errors in the logs entries like
>>> "Session authentication appears to have been lost at some point in time"
>>> and eventually it puts the node into maintenance mode and marks the power
>>> state as "none."
>>> > >> >> > After step 2 and taking the host back out of maintenance mode,
>>> it goes through a similar set of log entries puts the node into MM again.
>>> > >> >> >
>>> > >> >> > After the above steps, a conductor restart fixes the problem
>>> and operations work normally again. Given this it seems like there is still
>>> some kind of caching issue.
>>> > >> >> >
>>> > >> >> > On Sat, Jul 16, 2022 at 6:01 PM Wade Albright <
>>> the.wade.albright at gmail.com> wrote:
>>> > >> >> >>
>>> > >> >> >> Hi Julia,
>>> > >> >> >>
>>> > >> >> >> Thank you so much for the reply! Hopefully this is the issue.
>>> I'll try out the patches next week and report back. I'll also email you on
>>> Monday about the versions, that would be very helpful to know.
>>> > >> >> >>
>>> > >> >> >> Thanks again, really appreciate it.
>>> > >> >> >>
>>> > >> >> >> Wade
>>> > >> >> >>
>>> > >> >> >>
>>> > >> >> >>
>>> > >> >> >> On Sat, Jul 16, 2022 at 4:36 PM Julia Kreger <
>>> juliaashleykreger at gmail.com> wrote:
>>> > >> >> >>>
>>> > >> >> >>> Greetings!
>>> > >> >> >>>
>>> > >> >> >>> I believe you need two patches, one in ironic and one in
>>> sushy.
>>> > >> >> >>>
>>> > >> >> >>> Sushy:
>>> > >> >> >>> https://review.opendev.org/c/openstack/sushy/+/832860
>>> > >> >> >>>
>>> > >> >> >>> Ironic:
>>> > >> >> >>> https://review.opendev.org/c/openstack/ironic/+/820588
>>> > >> >> >>>
>>> > >> >> >>> I think it is variation, and the comment about working after
>>> you restart the conductor is the big signal to me. I’m on a phone on a bad
>>> data connection, if you email me on Monday I can see what versions the
>>> fixes would be in.
>>> > >> >> >>>
>>> > >> >> >>> For the record, it is a session cache issue, the bug was
>>> that the service didn’t quite know what to do when auth fails.
>>> > >> >> >>>
>>> > >> >> >>> -Julia
>>> > >> >> >>>
>>> > >> >> >>>
>>> > >> >> >>> On Fri, Jul 15, 2022 at 2:55 PM Wade Albright <
>>> the.wade.albright at gmail.com> wrote:
>>> > >> >> >>>>
>>> > >> >> >>>> Hi,
>>> > >> >> >>>>
>>> > >> >> >>>> I'm hitting a problem when trying to update the
>>> redfish_password for an existing node. I'm curious to know if anyone else
>>> has encountered this problem. I'm not sure if I'm just doing something
>>> wrong or if there is a bug. Or if the problem is unique to my setup.
>>> > >> >> >>>>
>>> > >> >> >>>> I have a node already added into ironic with all the driver
>>> details set, and things are working fine. I am able to run deployments.
>>> > >> >> >>>>
>>> > >> >> >>>> Now I need to change the redfish password on the host. So I
>>> update the password for redfish access on the host, then use an 'openstack
>>> baremetal node set <node> --driver-info redfish_password=<newpass>' command
>>> to set the new redfish_password.
>>> > >> >> >>>>
>>> > >> >> >>>> Once this has been done, deployment no longer works. I see
>>> redfish authentication errors in the logs and the operation fails. I waited
>>> a bit to see if there might just be a delay in updating the password, but
>>> after awhile it still didn't work.
>>> > >> >> >>>>
>>> > >> >> >>>> I restarted the conductor, and after that things work fine
>>> again. So it seems like the password is cached or something. Is there a way
>>> to force the password to update? I even tried removing the redfish
>>> credentials and re-adding them, but that didn't work either. Only a
>>> conductor restart seems to make the new password work.
>>> > >> >> >>>>
>>> > >> >> >>>> We are running Xena, using rpm installation on Oracle Linux
>>> 8.5.
>>> > >> >> >>>>
>>> > >> >> >>>> Thanks in advance for any help with this issue.
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20220812/ba483eee/attachment-0001.htm>


More information about the openstack-discuss mailing list