Thanks Jay,
It gets into 'clean failed' state because it fails to boot into PXE mode.
I don't understand why the DHCP does not respond to the clients request, it's like it remembers that the same client already received an IP in the past.
Is there a way to clear the dnsmasq database of reservations?
Igal

On Wed, Mar 24, 2021 at 5:26 PM Jay Faulkner <jay.faulkner@verizonmedia.com> wrote:
A node in CLEAN FAILED must be moved to MANAGEABLE state before it can be told to "provide" (which eventually puts it back in AVAILABLE).

Try this:
`openstack baremetal node manage UUID`, then run the command with "provide" as you did before.

The available states and their transitions are documented here: https://docs.openstack.org/ironic/latest/contributor/states.html

I'll note that if cleaning failed, it's possible the node is misconfigured in such a way that will cause all deployments and cleanings to fail (e.g.; if you're using Ironic with Nova, and you attempt to provision a machine and it errors during deploy; Nova will by default attempt to clean that node, which may be why you see it end up in clean failed). So I strongly suggest you look at the last_error field on the node and attempt to determine why the failure happened before retrying.

Good luck!

-Jay Faulkner

On Wed, Mar 24, 2021 at 8:20 AM Igal Katzir <ikatzir@infinidat.com> wrote:
Hello Team,

I had a situation where my undercloud-node had a problem with it’s disk and has disconnected from overcloud.
I couldn’t restore the undercloud controller and ended up re-installing it (running 'openstack undercloud install’).
The installation ended successfully but now I’m in a situation where Cleanup of the overcloud deployed nodes fails:

(undercloud) [stack@interop010 ~]$ openstack baremetal node list
+--------------------------------------+------------+---------------+-------------+--------------------+-------------+
| UUID                                       | Name       | Instance    UUID        | Power State | Provisioning State | Maintenance |
+--------------------------------------+------------+---------------+-------------+--------------------+-------------+
| 97b9a603-f64f-47c1-9fb4-6c68a5b38ff6 | interop025 | None          | power on    | clean failed       | True        |
| 4b02703a-f765-4ebb-85ed-75e88b4cbea5 | interop026 | None          | power on    | clean failed       | True        |
+--------------------------------------+------------+---------------+-------------+--------------------+-------------+

I’ve tried to move node to available state but cannot:
(undercloud) [stack@interop010 ~]$ openstack baremetal node provide 97b9a603-f64f-47c1-9fb4-6c68a5b38ff6
The requested action "provide" can not be performed on node "97b9a603-f64f-47c1-9fb4-6c68a5b38ff6" while it is in state "clean failed". (HTTP 400)

My question is:
How do I make the nodes available again? 
as the deployment of overcloud fails with:
ERROR due to "Message: No valid host was found. , Code: 500” 

Thanks,
Igal


--
Regards,
Igal Katzir
Cell +972-54-5597086
Interoperability Team
INFINIDAT