[oslo][neutron] Neutron Functional Test Failures with oslo.privsep 1.31.0

Brian Haley haleyb.dev at gmail.com
Mon Jan 7 20:05:06 UTC 2019


Hi Ben,

On 1/7/19 1:11 PM, Ben Nemec wrote:
> Renamed the thread to be more descriptive.
> 
> Just to update the list on this, it looks like the problem is a segfault 
> when the netlink_lib module makes a C call. Digging into that code a 
> bit, it appears there is a callback being used[1]. I've seen some 
> comments that when you use a callback with a Python thread, the thread 
> needs to be registered somehow, but this is all uncharted territory for 
> me. Suggestions gratefully accepted. :-)
> 
> 1: 
> https://github.com/openstack/neutron/blob/master/neutron/privileged/agent/linux/netlink_lib.py#L136

Maybe it's something as mentioned in the end of this section?

https://docs.python.org/2/library/ctypes.html#callback-functions

"Note

Make sure you keep references to CFUNCTYPE() objects as long as they are 
used from C code. ctypes doesn’t, and if you don’t, they may be garbage 
collected, crashing your program when a callback is made.

Also, note that if the callback function is called in a thread created 
outside of Python’s control (e.g. by the foreign code that calls the 
callback), ctypes creates a new dummy Python thread on every invocation. 
This behavior is correct for most purposes, but it means that values 
stored with threading.local will not survive across different callbacks, 
even when those calls are made from the same C thread."

I can try keeping a reference to the callback function and see if it 
makes any difference, but I'm assuming it's not that easy.

-Brian


> On 1/4/19 7:28 AM, Slawomir Kaplonski wrote:
>> Hi,
>>
>> I just found that functional tests in Neutron are failing since today 
>> or maybe yesterday. See [1]
>> I was able to reproduce it locally and it looks that it happens with 
>> oslo.privsep==1.31. With oslo.privsep==1.30.1 tests are fine.
>>
>> [1] https://bugs.launchpad.net/neutron/+bug/1810518
>>
>>>> Slawek Kaplonski
>> Senior software engineer
>> Red Hat
>>
>>> Wiadomość napisana przez Ben Nemec <openstack at nemebean.com> w dniu 
>>> 02.01.2019, o godz. 19:17:
>>>
>>> Yay alliteration! :-)
>>>
>>> I wanted to draw attention to this release[1] in particular because 
>>> it includes the parallel privsep change[2]. While it shouldn't have 
>>> any effect on the public API of the library, it does significantly 
>>> affect how privsep will process calls on the back end. Specifically, 
>>> multiple calls can now be processed at the same time, so if any 
>>> privileged code is not reentrant it's possible that new race bugs 
>>> could pop up.
>>>
>>> While this sounds scary, it's a necessary change to allow use of 
>>> privsep in situations where a privileged call may take a non-trivial 
>>> amount of time.  Cinder in particular has some privileged calls that 
>>> are long-running and can't afford to block all other privileged calls 
>>> on them.
>>>
>>> So if you're a consumer of oslo.privsep please keep your eyes open 
>>> for issues related to this new release and contact the Oslo team if 
>>> you find any. Thanks.
>>>
>>> -Ben
>>>
>>> 1: https://review.openstack.org/628019
>>> 2: https://review.openstack.org/#/c/593556/
>>>
>>
> 



More information about the openstack-discuss mailing list