[E] [ironic] How to move nodes from a 'clean failed' state into 'Available'

Julia Kreger juliaashleykreger at gmail.com
Wed Mar 31 19:49:38 UTC 2021


In that case, file a case with Red Hat support and provide them an
sosreport. Basically, you shouldn't have to reboot or restart dnsmasq
to get things to wake up. It is not about the version of ironic, but
more about the version of dnsmasq, but if there is an issue, their
support org needs that visibility so we can track it and get it
remedied because it is not an upstream issue in that case, but likely
a downstream issue.

On Wed, Mar 31, 2021 at 12:24 PM Igal Katzir <ikatzir at infinidat.com> wrote:
>
> Hi Julia,
> How can I easily tell the ironic version?
> This is an rhosp 16.1 installation so its pretty much new.
> Igal
>
> בתאריך יום ד׳, 31 במרץ 2021, 21:25, מאת Julia Kreger ‏<juliaashleykreger at gmail.com>:
>>
>> Out of curiosity, is this a very new version of dnsmasq? or an older
>> version? I ask because there have been some fixes and regressions
>> related to dnsmasq updating its configuration and responding to
>> machines appropriately. A version might be helpful, just to enable
>> those of us who are curious to go double check things at a minimum.
>>
>> On Wed, Mar 31, 2021 at 1:28 AM Igal Katzir <ikatzir at infinidat.com> wrote:
>> >
>> > Hello Forum,
>> > Just for the record, the problem was resolved by restarting all the ironic containers, I believe that restarting the UC node entirely would have also fixed that.
>> > So after the ironic containers started fresh, the PXE worked well, and after running 'openstack overcloud node introspect --all-manageable --provide' it shows:
>> > +--------------------------------------+------------+---------------+-------------+--------------------+-------------+
>> > | UUID                                 | Name       | Instance UUID | Power State | Provisioning State | Maintenance |
>> > +--------------------------------------+------------+---------------+-------------+--------------------+-------------+
>> > | 588bc3f6-dc14-4a07-8e38-202540d046f8 | interop025 | None          | power off   | available          | False       |
>> > | dceab84b-1d99-49b5-8f79-c589c0884269 | interop026 | None          | power off   | available          | False       |
>> > +--------------------------------------+------------+---------------+-------------+--------------------+-------------+
>> >
>> > I now ready for deployment of overcloud.
>> > thanks,
>> > Igal
>> >
>> > On Thu, Mar 25, 2021 at 12:48 AM Igal Katzir <ikatzir at infinidat.com> wrote:
>> >>
>> >> Thanks Jay,
>> >> It gets into 'clean failed' state because it fails to boot into PXE mode.
>> >> I don't understand why the DHCP does not respond to the clients request, it's like it remembers that the same client already received an IP in the past.
>> >> Is there a way to clear the dnsmasq database of reservations?
>> >> Igal
>> >>
>> >> On Wed, Mar 24, 2021 at 5:26 PM Jay Faulkner <jay.faulkner at verizonmedia.com> wrote:
>> >>>
>> >>> A node in CLEAN FAILED must be moved to MANAGEABLE state before it can be told to "provide" (which eventually puts it back in AVAILABLE).
>> >>>
>> >>> Try this:
>> >>> `openstack baremetal node manage UUID`, then run the command with "provide" as you did before.
>> >>>
>> >>> The available states and their transitions are documented here: https://docs.openstack.org/ironic/latest/contributor/states.html
>> >>>
>> >>> I'll note that if cleaning failed, it's possible the node is misconfigured in such a way that will cause all deployments and cleanings to fail (e.g.; if you're using Ironic with Nova, and you attempt to provision a machine and it errors during deploy; Nova will by default attempt to clean that node, which may be why you see it end up in clean failed). So I strongly suggest you look at the last_error field on the node and attempt to determine why the failure happened before retrying.
>> >>>
>> >>> Good luck!
>> >>>
>> >>> -Jay Faulkner
>> >>>
>> >>> On Wed, Mar 24, 2021 at 8:20 AM Igal Katzir <ikatzir at infinidat.com> wrote:
>> >>>>
>> >>>> Hello Team,
>> >>>>
>> >>>> I had a situation where my undercloud-node had a problem with it’s disk and has disconnected from overcloud.
>> >>>> I couldn’t restore the undercloud controller and ended up re-installing it (running 'openstack undercloud install’).
>> >>>> The installation ended successfully but now I’m in a situation where Cleanup of the overcloud deployed nodes fails:
>> >>>>
>> >>>> (undercloud) [stack at interop010 ~]$ openstack baremetal node list
>> >>>> +--------------------------------------+------------+---------------+-------------+--------------------+-------------+
>> >>>> | UUID                                       | Name       | Instance    UUID        | Power State | Provisioning State | Maintenance |
>> >>>> +--------------------------------------+------------+---------------+-------------+--------------------+-------------+
>> >>>> | 97b9a603-f64f-47c1-9fb4-6c68a5b38ff6 | interop025 | None          | power on    | clean failed       | True        |
>> >>>> | 4b02703a-f765-4ebb-85ed-75e88b4cbea5 | interop026 | None          | power on    | clean failed       | True        |
>> >>>> +--------------------------------------+------------+---------------+-------------+--------------------+-------------+
>> >>>>
>> >>>> I’ve tried to move node to available state but cannot:
>> >>>> (undercloud) [stack at interop010 ~]$ openstack baremetal node provide 97b9a603-f64f-47c1-9fb4-6c68a5b38ff6
>> >>>> The requested action "provide" can not be performed on node "97b9a603-f64f-47c1-9fb4-6c68a5b38ff6" while it is in state "clean failed". (HTTP 400)
>> >>>>
>> >>>> My question is:
>> >>>> How do I make the nodes available again?
>> >>>> as the deployment of overcloud fails with:
>> >>>> ERROR due to "Message: No valid host was found. , Code: 500”
>> >>>>
>> >>>> Thanks,
>> >>>> Igal
>> >>
>> >>
>> >>
>> >> --
>> >> Regards,
>> >> Igal Katzir
>> >> Cell +972-54-5597086
>> >> Interoperability Team
>> >> INFINIDAT
>> >>
>> >>
>> >>
>> >>
>> >
>> >
>> > --
>> > Regards,
>> > Igal Katzir
>> > Cell +972-54-5597086
>> > Interoperability Team
>> > INFINIDAT
>> >
>> >
>> >
>> >



More information about the openstack-discuss mailing list