[ironic] Cannot move nodes from state 'clean failed' into provisioning state 'Available'

Igal Katzir ikatzir at infinidat.com
Wed Mar 24 22:39:36 UTC 2021


Hello Julia, Thanks for your response.
I am using a RedHat Openstack Platform 16.1, which is running on RHEL 8.2.
All are physical servers;
- One Undercloud Director.
- Overcloud consists of two nodes. (This is for Certification purposes)
It is unlikely that it's a mac addr. mismatch (I wish...) since I've
already deployed these nodes several times, using the same nodes.json
Just for reference , here is the output:
(undercloud) [stack at interop010 ~]$ openstack baremetal port list
+--------------------------------------+-------------------+
| UUID                                                      | Address
    |
+--------------------------------------+-------------------+
| 2d404695-f236-4d32-8b65-5ca1fa6b756a | a0:36:9f:95:dd:e2 |
| 32669178-0408-4ff1-b4b4-df65fc7643c9 | 6c:ae:8b:69:ee:80 |
+--------------------------------------+-------------------+

The operation was working well until I have 'lost' the undercloud node, but
overcloud stayed working.
I might need to delete these nodes and run introspection again.

Igal

On Wed, Mar 24, 2021 at 7:31 PM Julia Kreger <juliaashleykreger at gmail.com>
wrote:

> So versions and overall configuration might help, *but* often these
> issues are just a typo with a MAC address or the wrong port. Can you
> verify that the MAC address your seeing DHCP requests for matchs what
> is recorded for the node in the `openstack baremetal port list`
> output?
>
> On Wed, Mar 24, 2021 at 8:18 AM Igal Katzir <ikatzir at infinidat.com> wrote:
> >
> > Hello all,
> >
> > While troubleshooting this, another observation I see is that when I run
> put the node in state provide:
> > 'openstack baremetal node provide 97b9a603-f64f-47c1-9fb4-6c68a5b38ff6’
> > It starts the cleaning process, then the node boots into PXE but the
> undercloud ignores it.
> > When I tap the port I see that requests reach its interface:
> >
> > (undercloud) [stack at interop010 ~]$ sudo tcpdump -i br-ctlplane
> > 10:43:10.600421 IP 0.0.0.0.bootpc > 255.255.255.255.bootps: BOOTP/DHCP,
> Request from a0:36:9f:95:dd:e2 (oui Unknown), length 548
> >
> > But on the same time the dnsmasq ignores it:
> > (undercloud) [stack at interop010 ~]$ sudo tail -f
> /var/log/containers/ironic-inspector/dnsmasq.log
> > Mar 24 10:39:43 dnsmasq-dhcp[7]: DHCPDISCOVER(br-ctlplane)
> 6c:ae:8b:69:ee:80 ignored
> > Mar 24 10:40:36 dnsmasq-dhcp[7]: DHCPDISCOVER(br-ctlplane)
> a0:36:9f:95:dd:e2 ignored
> > Mar 24 10:40:39 dnsmasq-dhcp[7]: DHCPDISCOVER(br-ctlplane)
> a0:36:9f:95:dd:e2 ignored
> > Mar 24 10:40:48 dnsmasq-dhcp[7]: DHCPDISCOVER(br-ctlplane)
> 6c:ae:8b:69:ee:80 ignored
> > Mar 24 10:41:52 dnsmasq-dhcp[7]: DHCPDISCOVER(br-ctlplane)
> 6c:ae:8b:69:ee:80 ignored
> > Mar 24 10:42:57 dnsmasq-dhcp[7]: DHCPDISCOVER(br-ctlplane)
> 6c:ae:8b:69:ee:80 ignored
> > Mar 24 10:43:06 dnsmasq-dhcp[7]: DHCPDISCOVER(br-ctlplane)
> a0:36:9f:95:dd:e2 ignored
> > Mar 24 10:43:10 dnsmasq-dhcp[7]: DHCPDISCOVER(br-ctlplane)
> a0:36:9f:95:dd:e2 ignored
> > Mar 24 10:43:14 dnsmasq-dhcp[7]: DHCPDISCOVER(br-ctlplane)
> a0:36:9f:95:dd:e2 ignored
> >
> > Why is that?
> > What is needed for the cleanup to start?
> >
> > Thanks,
> > Igal
> >
> > On 24 Mar 2021, at 0:09, Igal Katzir <ikatzir at infinidat.com> wrote:
> >
> > Hello Team,
> >
> > I had a situation where my undercloud-node had a problem with it’s disk
> and has disconnected from overcloud.
> > I couldn’t restore the undercloud controller and ended up re-installing
> it (running 'openstack undercloud install’).
> > The installation ended successfully but now I’m in a situation where
> Cleanup of the overcloud deployed nodes fails:
> >
> > (undercloud) [stack at interop010 ~]$ openstack baremetal node list
> >
> +--------------------------------------+------------+---------------+-------------+--------------------+-------------+
> > | UUID                                       | Name       | Instance
> UUID        | Power State | Provisioning State | Maintenance |
> >
> +--------------------------------------+------------+---------------+-------------+--------------------+-------------+
> > | 97b9a603-f64f-47c1-9fb4-6c68a5b38ff6 | interop025 | None          |
> power on    | clean failed       | True        |
> > | 4b02703a-f765-4ebb-85ed-75e88b4cbea5 | interop026 | None          |
> power on    | clean failed       | True        |
> >
> +--------------------------------------+------------+---------------+-------------+--------------------+-------------+
> >
> > I’ve tried to move node to available state but cannot:
> > (undercloud) [stack at interop010 ~]$ openstack baremetal node provide
> 97b9a603-f64f-47c1-9fb4-6c68a5b38ff6
> > The requested action "provide" can not be performed on node
> "97b9a603-f64f-47c1-9fb4-6c68a5b38ff6" while it is in state "clean failed".
> (HTTP 400)
> >
> > My question is:
> > How do I make the nodes available again?
> > as the deployment of overcloud fails with:
> > ERROR due to "Message: No valid host was found. , Code: 500”
> >
> > Thanks,
> > Igal
> >
> >
>


-- 
Regards,

*Igal Katzir*
Cell +972-54-5597086
Interoperability Team
*INFINIDAT*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-discuss/attachments/20210325/0c146a66/attachment-0001.html>


More information about the openstack-discuss mailing list