Hi Ben, On 1/7/19 1:11 PM, Ben Nemec wrote:
Renamed the thread to be more descriptive.
Just to update the list on this, it looks like the problem is a segfault when the netlink_lib module makes a C call. Digging into that code a bit, it appears there is a callback being used[1]. I've seen some comments that when you use a callback with a Python thread, the thread needs to be registered somehow, but this is all uncharted territory for me. Suggestions gratefully accepted. :-)
1: https://github.com/openstack/neutron/blob/master/neutron/privileged/agent/li...
Maybe it's something as mentioned in the end of this section? https://docs.python.org/2/library/ctypes.html#callback-functions "Note Make sure you keep references to CFUNCTYPE() objects as long as they are used from C code. ctypes doesn’t, and if you don’t, they may be garbage collected, crashing your program when a callback is made. Also, note that if the callback function is called in a thread created outside of Python’s control (e.g. by the foreign code that calls the callback), ctypes creates a new dummy Python thread on every invocation. This behavior is correct for most purposes, but it means that values stored with threading.local will not survive across different callbacks, even when those calls are made from the same C thread." I can try keeping a reference to the callback function and see if it makes any difference, but I'm assuming it's not that easy. -Brian
On 1/4/19 7:28 AM, Slawomir Kaplonski wrote:
Hi,
I just found that functional tests in Neutron are failing since today or maybe yesterday. See [1] I was able to reproduce it locally and it looks that it happens with oslo.privsep==1.31. With oslo.privsep==1.30.1 tests are fine.
[1] https://bugs.launchpad.net/neutron/+bug/1810518
— Slawek Kaplonski Senior software engineer Red Hat
Wiadomość napisana przez Ben Nemec <openstack@nemebean.com> w dniu 02.01.2019, o godz. 19:17:
Yay alliteration! :-)
I wanted to draw attention to this release[1] in particular because it includes the parallel privsep change[2]. While it shouldn't have any effect on the public API of the library, it does significantly affect how privsep will process calls on the back end. Specifically, multiple calls can now be processed at the same time, so if any privileged code is not reentrant it's possible that new race bugs could pop up.
While this sounds scary, it's a necessary change to allow use of privsep in situations where a privileged call may take a non-trivial amount of time. Cinder in particular has some privileged calls that are long-running and can't afford to block all other privileged calls on them.
So if you're a consumer of oslo.privsep please keep your eyes open for issues related to this new release and contact the Oslo team if you find any. Thanks.
-Ben
1: https://review.openstack.org/628019 2: https://review.openstack.org/#/c/593556/