[E] [ironic] How to move nodes from a 'clean failed' state into 'Available'
Julia Kreger
juliaashleykreger at gmail.com
Wed Mar 31 18:25:25 UTC 2021
Out of curiosity, is this a very new version of dnsmasq? or an older
version? I ask because there have been some fixes and regressions
related to dnsmasq updating its configuration and responding to
machines appropriately. A version might be helpful, just to enable
those of us who are curious to go double check things at a minimum.
On Wed, Mar 31, 2021 at 1:28 AM Igal Katzir <ikatzir at infinidat.com> wrote:
>
> Hello Forum,
> Just for the record, the problem was resolved by restarting all the ironic containers, I believe that restarting the UC node entirely would have also fixed that.
> So after the ironic containers started fresh, the PXE worked well, and after running 'openstack overcloud node introspect --all-manageable --provide' it shows:
> +--------------------------------------+------------+---------------+-------------+--------------------+-------------+
> | UUID | Name | Instance UUID | Power State | Provisioning State | Maintenance |
> +--------------------------------------+------------+---------------+-------------+--------------------+-------------+
> | 588bc3f6-dc14-4a07-8e38-202540d046f8 | interop025 | None | power off | available | False |
> | dceab84b-1d99-49b5-8f79-c589c0884269 | interop026 | None | power off | available | False |
> +--------------------------------------+------------+---------------+-------------+--------------------+-------------+
>
> I now ready for deployment of overcloud.
> thanks,
> Igal
>
> On Thu, Mar 25, 2021 at 12:48 AM Igal Katzir <ikatzir at infinidat.com> wrote:
>>
>> Thanks Jay,
>> It gets into 'clean failed' state because it fails to boot into PXE mode.
>> I don't understand why the DHCP does not respond to the clients request, it's like it remembers that the same client already received an IP in the past.
>> Is there a way to clear the dnsmasq database of reservations?
>> Igal
>>
>> On Wed, Mar 24, 2021 at 5:26 PM Jay Faulkner <jay.faulkner at verizonmedia.com> wrote:
>>>
>>> A node in CLEAN FAILED must be moved to MANAGEABLE state before it can be told to "provide" (which eventually puts it back in AVAILABLE).
>>>
>>> Try this:
>>> `openstack baremetal node manage UUID`, then run the command with "provide" as you did before.
>>>
>>> The available states and their transitions are documented here: https://docs.openstack.org/ironic/latest/contributor/states.html
>>>
>>> I'll note that if cleaning failed, it's possible the node is misconfigured in such a way that will cause all deployments and cleanings to fail (e.g.; if you're using Ironic with Nova, and you attempt to provision a machine and it errors during deploy; Nova will by default attempt to clean that node, which may be why you see it end up in clean failed). So I strongly suggest you look at the last_error field on the node and attempt to determine why the failure happened before retrying.
>>>
>>> Good luck!
>>>
>>> -Jay Faulkner
>>>
>>> On Wed, Mar 24, 2021 at 8:20 AM Igal Katzir <ikatzir at infinidat.com> wrote:
>>>>
>>>> Hello Team,
>>>>
>>>> I had a situation where my undercloud-node had a problem with it’s disk and has disconnected from overcloud.
>>>> I couldn’t restore the undercloud controller and ended up re-installing it (running 'openstack undercloud install’).
>>>> The installation ended successfully but now I’m in a situation where Cleanup of the overcloud deployed nodes fails:
>>>>
>>>> (undercloud) [stack at interop010 ~]$ openstack baremetal node list
>>>> +--------------------------------------+------------+---------------+-------------+--------------------+-------------+
>>>> | UUID | Name | Instance UUID | Power State | Provisioning State | Maintenance |
>>>> +--------------------------------------+------------+---------------+-------------+--------------------+-------------+
>>>> | 97b9a603-f64f-47c1-9fb4-6c68a5b38ff6 | interop025 | None | power on | clean failed | True |
>>>> | 4b02703a-f765-4ebb-85ed-75e88b4cbea5 | interop026 | None | power on | clean failed | True |
>>>> +--------------------------------------+------------+---------------+-------------+--------------------+-------------+
>>>>
>>>> I’ve tried to move node to available state but cannot:
>>>> (undercloud) [stack at interop010 ~]$ openstack baremetal node provide 97b9a603-f64f-47c1-9fb4-6c68a5b38ff6
>>>> The requested action "provide" can not be performed on node "97b9a603-f64f-47c1-9fb4-6c68a5b38ff6" while it is in state "clean failed". (HTTP 400)
>>>>
>>>> My question is:
>>>> How do I make the nodes available again?
>>>> as the deployment of overcloud fails with:
>>>> ERROR due to "Message: No valid host was found. , Code: 500”
>>>>
>>>> Thanks,
>>>> Igal
>>
>>
>>
>> --
>> Regards,
>> Igal Katzir
>> Cell +972-54-5597086
>> Interoperability Team
>> INFINIDAT
>>
>>
>>
>>
>
>
> --
> Regards,
> Igal Katzir
> Cell +972-54-5597086
> Interoperability Team
> INFINIDAT
>
>
>
>
More information about the openstack-discuss
mailing list