[openstack-dev] [devstack] [ironic] [nova] Trying again on wait_for_compute in devstack

Brian Haley haleyb.dev at gmail.com
Wed Aug 2 20:55:51 UTC 2017


On 08/02/2017 07:17 AM, Sean Dague wrote:
> The 3 node scenarios in Neutron (which are still experimental nv) are
> typically failing to bring online the 3rd compute. In cells v2 you have
> to explicitly add nodes to the cells. There is a nova-manage command
> "discover-hosts" that takes all the compute nodes which have checked in,
> but aren't yet assigned to a cell, and puts them into a cell of your
> choosing. We do this in devstack-gate in the gate.
> 
> However... subnodes don't take very long to setup (so few services). And
> the nova-compute process takes about 30s before it's done all it's
> initialization and actually checks in to the cluster. It's a real
> possibility that discover_hosts will run before subnode 3 checks in. We
> see it in logs. This also really could come and bite us on any multinode
> job, and I'm a bit concerned some of the multinode jobs aren't running
> multinode some times because of it.
> 
> One way to fix this, without putting more logic in devstack-gate, is
> ensure that by the time stack.sh finishes, the compute node is up. This
> was tried previously, but it turned out that we totally missed that it
> broke Ironic (the check happened too early, ironic was not yet running,
> so we always failed), Cells v1 (munges hostnames :(  ), and PowerVM
> (their nova-compute was never starting correctly, and they were working
> around it with a restart later).
> 
> This patch https://review.openstack.org/#/c/488381/ tries again. The
> check is moved very late, Ironic seems to be running fine with it. Cells
> v1 is just skipped, that's deprecated in Nova now, and we're not going
> to use it in multinode scenarios. The PowerVM team fixed their
> nova-compute start issues, so they should be good to go as well.

I had also filed https://bugs.launchpad.net/neutron/+bug/1707003 for 
this since it was mainly just affecting that one 3-node neutron job. 
Glad I hadn't started working on a patch, I'll just take a look at yours.

Thanks for working on it!

-Brian

> This is an FYI that we're going to land this again soon. If you think
> this impacts your CI / jobs, please speak up. The CI runs on both the
> main and experimental queue on devstack for this change look pretty
> good, so I think we're safe to move forward this time. But we also
> thought that the last time, and were wrong.
> 
> 	-Sean
> 




More information about the OpenStack-dev mailing list