Out of curiosity, is this a very new version of dnsmasq? or an older version? I ask because there have been some fixes and regressions related to dnsmasq updating its configuration and responding to machines appropriately. A version might be helpful, just to enable those of us who are curious to go double check things at a minimum. On Wed, Mar 31, 2021 at 1:28 AM Igal Katzir <ikatzir@infinidat.com> wrote:
Hello Forum, Just for the record, the problem was resolved by restarting all the ironic containers, I believe that restarting the UC node entirely would have also fixed that. So after the ironic containers started fresh, the PXE worked well, and after running 'openstack overcloud node introspect --all-manageable --provide' it shows: +--------------------------------------+------------+---------------+-------------+--------------------+-------------+ | UUID | Name | Instance UUID | Power State | Provisioning State | Maintenance | +--------------------------------------+------------+---------------+-------------+--------------------+-------------+ | 588bc3f6-dc14-4a07-8e38-202540d046f8 | interop025 | None | power off | available | False | | dceab84b-1d99-49b5-8f79-c589c0884269 | interop026 | None | power off | available | False | +--------------------------------------+------------+---------------+-------------+--------------------+-------------+
I now ready for deployment of overcloud. thanks, Igal
On Thu, Mar 25, 2021 at 12:48 AM Igal Katzir <ikatzir@infinidat.com> wrote:
Thanks Jay, It gets into 'clean failed' state because it fails to boot into PXE mode. I don't understand why the DHCP does not respond to the clients request, it's like it remembers that the same client already received an IP in the past. Is there a way to clear the dnsmasq database of reservations? Igal
On Wed, Mar 24, 2021 at 5:26 PM Jay Faulkner <jay.faulkner@verizonmedia.com> wrote:
A node in CLEAN FAILED must be moved to MANAGEABLE state before it can be told to "provide" (which eventually puts it back in AVAILABLE).
Try this: `openstack baremetal node manage UUID`, then run the command with "provide" as you did before.
The available states and their transitions are documented here: https://docs.openstack.org/ironic/latest/contributor/states.html
I'll note that if cleaning failed, it's possible the node is misconfigured in such a way that will cause all deployments and cleanings to fail (e.g.; if you're using Ironic with Nova, and you attempt to provision a machine and it errors during deploy; Nova will by default attempt to clean that node, which may be why you see it end up in clean failed). So I strongly suggest you look at the last_error field on the node and attempt to determine why the failure happened before retrying.
Good luck!
-Jay Faulkner
On Wed, Mar 24, 2021 at 8:20 AM Igal Katzir <ikatzir@infinidat.com> wrote:
Hello Team,
I had a situation where my undercloud-node had a problem with it’s disk and has disconnected from overcloud. I couldn’t restore the undercloud controller and ended up re-installing it (running 'openstack undercloud install’). The installation ended successfully but now I’m in a situation where Cleanup of the overcloud deployed nodes fails:
(undercloud) [stack@interop010 ~]$ openstack baremetal node list +--------------------------------------+------------+---------------+-------------+--------------------+-------------+ | UUID | Name | Instance UUID | Power State | Provisioning State | Maintenance | +--------------------------------------+------------+---------------+-------------+--------------------+-------------+ | 97b9a603-f64f-47c1-9fb4-6c68a5b38ff6 | interop025 | None | power on | clean failed | True | | 4b02703a-f765-4ebb-85ed-75e88b4cbea5 | interop026 | None | power on | clean failed | True | +--------------------------------------+------------+---------------+-------------+--------------------+-------------+
I’ve tried to move node to available state but cannot: (undercloud) [stack@interop010 ~]$ openstack baremetal node provide 97b9a603-f64f-47c1-9fb4-6c68a5b38ff6 The requested action "provide" can not be performed on node "97b9a603-f64f-47c1-9fb4-6c68a5b38ff6" while it is in state "clean failed". (HTTP 400)
My question is: How do I make the nodes available again? as the deployment of overcloud fails with: ERROR due to "Message: No valid host was found. , Code: 500”
Thanks, Igal
-- Regards, Igal Katzir Cell +972-54-5597086 Interoperability Team INFINIDAT
-- Regards, Igal Katzir Cell +972-54-5597086 Interoperability Team INFINIDAT