[nova] New gate bug 1844929, timed out waiting for response from cell during scheduling

22 Sep 2019

      I noticed this while looking at a grenade failure on an unrelated patch:

https://bugs.launchpad.net/nova/+bug/1844929

The details are in the bug but it looks like this showed up around Sept 
17 and hits mostly on FortNebula nodes but also OVH nodes. It's 
restricted to grenade jobs and while I don't see anything obvious in the 
rabbitmq logs (the only errors are about uwsgi [api] heartbeat issues), 
it's possible that these are slower infra nodes and we're just not 
waiting for something properly during the grenade upgrade. We also don't 
seem to have the mysql logs published during the grenade jobs which we 
need to fix (and recently did fix for devstack jobs [1] but grenade jobs 
are still using devstack-gate so log collection happens there).

I didn't see any changes in nova, grenade or devstack since Sept 16 that 
look like they would be related to this so I'm guessing right now it's 
just a combination of performance on certain infra nodes (slower?) and 
something in grenade/nova not restarting properly or not waiting long 
enough for the upgrade to complete.

[1] 
https://github.com/openstack/devstack/commit/f92c346131db2c89b930b1a23f84894...

-- 

Thanks,

Matt

Matt Riedemann

Mark Goddard

Donny Davis

Donny Davis

Matt Riedemann

Matt Riedemann

tags

participants (3)