[openstack-dev] [nova] Bug 1781710 killing the check queue
Matt Riedemann
mriedemos at gmail.com
Wed Jul 18 16:14:26 UTC 2018
As can be seen from logstash [1] this bug is hurting us pretty bad in
the check queue.
I thought I originally had this fixed with [2] but that turned out to
only be part of the issue.
I think I've identified the problem but I have failed to write a
recreate regression test [3] because (I think) it's due to random
ordering of which request spec we select to send to the scheduler during
a multi-create request (and I tried making that predictable by sorting
the instances by uuid in both conductor and the scheduler but that
didn't make a difference in my test).
I started with one fix yesterday [4] but that would regress an earlier
fix for resizing servers to the same host which are in an anti-affinity
group. If we went that route, it will involve changes to how we handle
RequestSpec.num_instances (either not persist it, or reset it during
move operations).
After talking with Sean Mooney, we have another fix which is
self-contained to the scheduler [5] so we wouldn't need to make any
changes to the RequestSpec handling in conductor. It's admittedly a bit
hairy, so I'm asking for some eyes on it since either way we go, we
should get going soon before we hit the FF and RC1 rush which *always*
kills the gate.
[1] http://status.openstack.org/elastic-recheck/index.html#1781710
[2] https://review.openstack.org/#/c/582976/
[3] https://review.openstack.org/#/c/583339
[4] https://review.openstack.org/#/c/583351
[5] https://review.openstack.org/#/c/583347
--
Thanks,
Matt
More information about the OpenStack-dev
mailing list