[nova][gate] Thoughts on working around bug 1853453?

Clark Boylan cboylan at sapwetik.org
Thu Nov 21 17:14:05 UTC 2019


On Thu, Nov 21, 2019, at 9:08 AM, Matt Riedemann wrote:
> I've been noticing these shelve/unshelve guest ssh fail due to dhcp 
> lease issues quite a bit recently and wrote a bug and e-r query for it 
> this morning:
> 
> http://status.openstack.org/elastic-recheck/#1853453
> 
> The problem seems to stem from when these shelve tests run on multinode 
> jobs and we shelve on one host and unshelve on another.
> 
> I have a patch up to nova to force config drive in the nova-next job 
> where this hits the most:
> 
> https://review.opendev.org/#/c/695431
> 
> But that's just kind of a stab in the dark to take the metadata API out 
> of the picture for cloud-init.
> 
> If that doesn't help, and we don't know what is causing this or have 
> ideas to debug it, we might need to consider making a change to 
> shelve/unshelve testing in tempest such that we try to unshelve on 
> original host. Now I realize that is unfortunate since the whole point 
> of shelve offloading and unshelving is that you can land on another host 
> and things are good, but if these tests continue to be a high failure 
> rate in multinode jobs we probably need to consider workarounds if no 
> one is going to dig into the failures and figure out what is going wrong.
> 
> Thoughts?

I have no evidence for this, but is it possible that the dhcp anti spoofing rules that neutron installs on the firewall prevent the dhcp packets from flowing (like spice!) on the new compute node? I want to say there is a tcpdump "service" we can enable in devstack jobs with a ruleset that we could use to examine this. Basically dump the dhcp traffic and see if it goes through.

Clark



More information about the openstack-discuss mailing list