Re: [E] [ironic] How to move nodes from a 'clean failed' state into 'Available'

1 Apr 2021


      In that case, file a case with Red Hat support and provide them an
sosreport. Basically, you shouldn't have to reboot or restart dnsmasq
to get things to wake up. It is not about the version of ironic, but
more about the version of dnsmasq, but if there is an issue, their
support org needs that visibility so we can track it and get it
remedied because it is not an upstream issue in that case, but likely
a downstream issue.

On Wed, Mar 31, 2021 at 12:24 PM Igal Katzir <ikatzir@infinidat.com> wrote:
...
Hi Julia,
How can I easily tell the ironic version?
This is an rhosp 16.1 installation so its pretty much new.
Igal
בתאריך יום ד׳, 31 במרץ 2021, 21:25, מאת Julia Kreger ‏<juliaashleykreger@gmail.com>:
...
Out of curiosity, is this a very new version of dnsmasq? or an older
version? I ask because there have been some fixes and regressions
related to dnsmasq updating its configuration and responding to
machines appropriately. A version might be helpful, just to enable
those of us who are curious to go double check things at a minimum.
On Wed, Mar 31, 2021 at 1:28 AM Igal Katzir <ikatzir@infinidat.com> wrote:
...
Hello Forum,
Just for the record, the problem was resolved by restarting all the ironic containers, I believe that restarting the UC node entirely would have also fixed that.
So after the ironic containers started fresh, the PXE worked well, and after running 'openstack overcloud node introspect --all-manageable --provide' it shows:
+--------------------------------------+------------+---------------+-------------+--------------------+-------------+
| UUID                                 | Name       | Instance UUID | Power State | Provisioning State | Maintenance |
+--------------------------------------+------------+---------------+-------------+--------------------+-------------+
| 588bc3f6-dc14-4a07-8e38-202540d046f8 | interop025 | None          | power off   | available          | False       |
| dceab84b-1d99-49b5-8f79-c589c0884269 | interop026 | None          | power off   | available          | False       |
+--------------------------------------+------------+---------------+-------------+--------------------+-------------+
I now ready for deployment of overcloud.
thanks,
Igal
On Thu, Mar 25, 2021 at 12:48 AM Igal Katzir <ikatzir@infinidat.com> wrote:
...
Thanks Jay,
It gets into 'clean failed' state because it fails to boot into PXE mode.
I don't understand why the DHCP does not respond to the clients request, it's like it remembers that the same client already received an IP in the past.
Is there a way to clear the dnsmasq database of reservations?
Igal
On Wed, Mar 24, 2021 at 5:26 PM Jay Faulkner <jay.faulkner@verizonmedia.com> wrote:
...
A node in CLEAN FAILED must be moved to MANAGEABLE state before it can be told to "provide" (which eventually puts it back in AVAILABLE).
Try this:
`openstack baremetal node manage UUID`, then run the command with "provide" as you did before.
The available states and their transitions are documented here: https://docs.openstack.org/ironic/latest/contributor/states.html
I'll note that if cleaning failed, it's possible the node is misconfigured in such a way that will cause all deployments and cleanings to fail (e.g.; if you're using Ironic with Nova, and you attempt to provision a machine and it errors during deploy; Nova will by default attempt to clean that node, which may be why you see it end up in clean failed). So I strongly suggest you look at the last_error field on the node and attempt to determine why the failure happened before retrying.
Good luck!
-Jay Faulkner
On Wed, Mar 24, 2021 at 8:20 AM Igal Katzir <ikatzir@infinidat.com> wrote:
...
Hello Team,
I had a situation where my undercloud-node had a problem with it’s disk and has disconnected from overcloud.
I couldn’t restore the undercloud controller and ended up re-installing it (running 'openstack undercloud install’).
The installation ended successfully but now I’m in a situation where Cleanup of the overcloud deployed nodes fails:
(undercloud) [stack@interop010 ~]$ openstack baremetal node list
+--------------------------------------+------------+---------------+-------------+--------------------+-------------+
| UUID                                       | Name       | Instance    UUID        | Power State | Provisioning State | Maintenance |
+--------------------------------------+------------+---------------+-------------+--------------------+-------------+
| 97b9a603-f64f-47c1-9fb4-6c68a5b38ff6 | interop025 | None          | power on    | clean failed       | True        |
| 4b02703a-f765-4ebb-85ed-75e88b4cbea5 | interop026 | None          | power on    | clean failed       | True        |
+--------------------------------------+------------+---------------+-------------+--------------------+-------------+
I’ve tried to move node to available state but cannot:
(undercloud) [stack@interop010 ~]$ openstack baremetal node provide 97b9a603-f64f-47c1-9fb4-6c68a5b38ff6
The requested action "provide" can not be performed on node "97b9a603-f64f-47c1-9fb4-6c68a5b38ff6" while it is in state "clean failed". (HTTP 400)
My question is:
How do I make the nodes available again?
as the deployment of overcloud fails with:
ERROR due to "Message: No valid host was found. , Code: 500”
Thanks,
Igal
--
Regards,
Igal Katzir
Cell +972-54-5597086
Interoperability Team
INFINIDAT
--
Regards,
Igal Katzir
Cell +972-54-5597086
Interoperability Team
INFINIDAT