[nova][gate] Thoughts on working around bug 1853453?

21 Nov 2019

      I've been noticing these shelve/unshelve guest ssh fail due to dhcp 
lease issues quite a bit recently and wrote a bug and e-r query for it 
this morning:

http://status.openstack.org/elastic-recheck/#1853453

The problem seems to stem from when these shelve tests run on multinode 
jobs and we shelve on one host and unshelve on another.

I have a patch up to nova to force config drive in the nova-next job 
where this hits the most:

https://review.opendev.org/#/c/695431

But that's just kind of a stab in the dark to take the metadata API out 
of the picture for cloud-init.

If that doesn't help, and we don't know what is causing this or have 
ideas to debug it, we might need to consider making a change to 
shelve/unshelve testing in tempest such that we try to unshelve on 
original host. Now I realize that is unfortunate since the whole point 
of shelve offloading and unshelving is that you can land on another host 
and things are good, but if these tests continue to be a high failure 
rate in multinode jobs we probably need to consider workarounds if no 
one is going to dig into the failures and figure out what is going wrong.

Thoughts?

-- 

Thanks,

Matt

[nova][gate] Thoughts on working around bug 1853453?

Matt Riedemann