Hello Julia, Thanks for your response. I am using a RedHat Openstack Platform 16.1, which is running on RHEL 8.2. All are physical servers; - One Undercloud Director. - Overcloud consists of two nodes. (This is for Certification purposes) It is unlikely that it's a mac addr. mismatch (I wish...) since I've already deployed these nodes several times, using the same nodes.json Just for reference , here is the output: (undercloud) [stack@interop010 ~]$ openstack baremetal port list +--------------------------------------+-------------------+ | UUID | Address | +--------------------------------------+-------------------+ | 2d404695-f236-4d32-8b65-5ca1fa6b756a | a0:36:9f:95:dd:e2 | | 32669178-0408-4ff1-b4b4-df65fc7643c9 | 6c:ae:8b:69:ee:80 | +--------------------------------------+-------------------+ The operation was working well until I have 'lost' the undercloud node, but overcloud stayed working. I might need to delete these nodes and run introspection again. Igal On Wed, Mar 24, 2021 at 7:31 PM Julia Kreger <juliaashleykreger@gmail.com> wrote:
So versions and overall configuration might help, *but* often these issues are just a typo with a MAC address or the wrong port. Can you verify that the MAC address your seeing DHCP requests for matchs what is recorded for the node in the `openstack baremetal port list` output?
On Wed, Mar 24, 2021 at 8:18 AM Igal Katzir <ikatzir@infinidat.com> wrote:
Hello all,
While troubleshooting this, another observation I see is that when I run
put the node in state provide:
'openstack baremetal node provide 97b9a603-f64f-47c1-9fb4-6c68a5b38ff6’ It starts the cleaning process, then the node boots into PXE but the undercloud ignores it. When I tap the port I see that requests reach its interface:
(undercloud) [stack@interop010 ~]$ sudo tcpdump -i br-ctlplane 10:43:10.600421 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP, Request from a0:36:9f:95:dd:e2 (oui Unknown), length 548
But on the same time the dnsmasq ignores it: (undercloud) [stack@interop010 ~]$ sudo tail -f /var/log/containers/ironic-inspector/dnsmasq.log Mar 24 10:39:43 dnsmasq-dhcp[7]: DHCPDISCOVER(br-ctlplane) 6c:ae:8b:69:ee:80 ignored Mar 24 10:40:36 dnsmasq-dhcp[7]: DHCPDISCOVER(br-ctlplane) a0:36:9f:95:dd:e2 ignored Mar 24 10:40:39 dnsmasq-dhcp[7]: DHCPDISCOVER(br-ctlplane) a0:36:9f:95:dd:e2 ignored Mar 24 10:40:48 dnsmasq-dhcp[7]: DHCPDISCOVER(br-ctlplane) 6c:ae:8b:69:ee:80 ignored Mar 24 10:41:52 dnsmasq-dhcp[7]: DHCPDISCOVER(br-ctlplane) 6c:ae:8b:69:ee:80 ignored Mar 24 10:42:57 dnsmasq-dhcp[7]: DHCPDISCOVER(br-ctlplane) 6c:ae:8b:69:ee:80 ignored Mar 24 10:43:06 dnsmasq-dhcp[7]: DHCPDISCOVER(br-ctlplane) a0:36:9f:95:dd:e2 ignored Mar 24 10:43:10 dnsmasq-dhcp[7]: DHCPDISCOVER(br-ctlplane) a0:36:9f:95:dd:e2 ignored Mar 24 10:43:14 dnsmasq-dhcp[7]: DHCPDISCOVER(br-ctlplane) a0:36:9f:95:dd:e2 ignored
Why is that? What is needed for the cleanup to start?
Thanks, Igal
On 24 Mar 2021, at 0:09, Igal Katzir <ikatzir@infinidat.com> wrote:
Hello Team,
I had a situation where my undercloud-node had a problem with it’s disk and has disconnected from overcloud. I couldn’t restore the undercloud controller and ended up re-installing it (running 'openstack undercloud install’). The installation ended successfully but now I’m in a situation where Cleanup of the overcloud deployed nodes fails:
(undercloud) [stack@interop010 ~]$ openstack baremetal node list
+--------------------------------------+------------+---------------+-------------+--------------------+-------------+
| UUID | Name | Instance UUID | Power State | Provisioning State | Maintenance |
| 97b9a603-f64f-47c1-9fb4-6c68a5b38ff6 | interop025 | None |
| 4b02703a-f765-4ebb-85ed-75e88b4cbea5 | interop026 | None |
+--------------------------------------+------------+---------------+-------------+--------------------+-------------+ power on | clean failed | True | power on | clean failed | True |
+--------------------------------------+------------+---------------+-------------+--------------------+-------------+
I’ve tried to move node to available state but cannot: (undercloud) [stack@interop010 ~]$ openstack baremetal node provide
97b9a603-f64f-47c1-9fb4-6c68a5b38ff6
The requested action "provide" can not be performed on node "97b9a603-f64f-47c1-9fb4-6c68a5b38ff6" while it is in state "clean failed". (HTTP 400)
My question is: How do I make the nodes available again? as the deployment of overcloud fails with: ERROR due to "Message: No valid host was found. , Code: 500”
Thanks, Igal
-- Regards, *Igal Katzir* Cell +972-54-5597086 Interoperability Team *INFINIDAT*