I think it's worth noting that this has actually demonstrated a rather significant issue with threaded privsep, which is that forking from a Python thread is really not a safe thing to do.[1][2]
Sure, we could just say "don't fork in privileged code", but in this case the fork wasn't even in our code, it was in a library we were using. There are a few options, none of which I'm crazy about at this point:
* Provide a way for callers to specify that a call needs to run in-process rather than in the thread-pool. Two problems with this: 1) It requires the callers to know that forking is happening and 2) I'm not sure it actually fixes all of the potential problems. You might need to have a completely separate privsep daemon to avoid the potential bad fork/thread interactions.
* Switch to multiprocessing so calls execute in their own process. I may be wrong, but I think this requires all of the parameters passed in to be pickleable, which I bet is not remotely the case right now.
I'm open to suggestions that are better than playing whack-a-mole with these bugs using a threaded and un-threaded daemon.
-Ben
1: https://rachelbythebay.com/w/2011/06/07/forked/ 2: https://rachelbythebay.com/w/2014/08/16/forkenv/
On 1/17/19 2:12 PM, Slawomir Kaplonski wrote:
Hi,
Recently we had one more issue related to oslo.privsep and pyroute2. This caused many failures in Neutron CI. See [1] for details. Now fix (more like a workaround) for this issue is merged [2]. So if You saw in Your patch failing tempest/scenario jobs and in failed tests there were issues with SSH to instance through floating IP, please now rebase Your patch. It should be better :)
[1] https://bugs.launchpad.net/neutron/+bug/1811515 [2] https://review.openstack.org/#/c/631275/
— Slawek Kaplonski Senior software engineer Red Hat