On 4/3/19 1:20 PM, Matt Riedemann wrote:
On 3/28/2019 7:42 PM, Mohammed Naser wrote:
Looks like some progress has been made but we're pretty confident that this is more and more an Oslo.service bug:
Matt & Dan have both left ideas around this with possible solutions on how to make a change like this back portable..
Another update on this, but I was trying to recreate the original reported issue in the nova bug:
https://bugs.launchpad.net/nova/+bug/1715374
And I didn't even get to the point of the libvirt driver waiting for the network-vif-plugged event because privsep blows up much earlier during server create after SIGHUP'ing the service. Details start at comment 34 in that bug, but the tl;dr is the privsep-helper child processes are gone after the SIGHUP so anything that relies on privsep (which is anything using root in the libvirt driver and os-vif utils code now I think) won't work until you restart the service.
I don't yet know if this is a regression in Stein but I'm going to create a stable/rocky devstack and try to find out.
With that oslo.service patch [1] in place, I recreated Matt's result as described above. Then I hacked on oslo.privsep a bit [2] and was able to resolve the issue (create instances smoothly after SIGHUPping n-cpu.service). That fix is going to need UT, but also more thread- and socket- and security-savvy eyeballs to make sure it has legs. But hopefully we can finally put this one to bed. efried [1] https://review.opendev.org/#/c/641907/ [2] https://review.opendev.org/#/c/678323/