[oslo][neutron] Neutron Functional Test Failures with oslo.privsep 1.31.0

Ben Nemec openstack at nemebean.com
Tue Jan 8 19:04:58 UTC 2019


Further update: I dusted off my gdb skills and attached it to the 
privsep process to try to get more details about exactly what is 
crashing. It looks like the segfault happens on this line:

https://git.netfilter.org/libnetfilter_conntrack/tree/src/conntrack/api.c#n239

which is

h->cb = cb;

h being the conntrack handle and cb being the callback function.

This makes me think the problem isn't the callback itself (even if we 
assigned a bogus pointer, which we didn't, it shouldn't cause a segfault 
unless you try to dereference it) but in the handle we pass in. Trying 
to look at h->cb results in:

(gdb) print h->cb
Cannot access memory at address 0x800f228

Interestingly, h itself is fine:

(gdb) print h
$3 = (struct nfct_handle *) 0x800f1e0

It doesn't _look_ to me like the handle should be crossing any thread 
boundaries or anything, so I'm not sure why it would be a problem. It 
gets created in the same privileged function that ultimately registers 
the callback: 
https://github.com/openstack/neutron/blob/aa8a6ea848aae6882abb631b7089836dee8f4008/neutron/privileged/agent/linux/netlink_lib.py#L246

So still not sure what's going on, but I thought I'd share what I've 
found before I stop to eat something.

-Ben

On 1/7/19 12:11 PM, Ben Nemec wrote:
> Renamed the thread to be more descriptive.
> 
> Just to update the list on this, it looks like the problem is a segfault 
> when the netlink_lib module makes a C call. Digging into that code a 
> bit, it appears there is a callback being used[1]. I've seen some 
> comments that when you use a callback with a Python thread, the thread 
> needs to be registered somehow, but this is all uncharted territory for 
> me. Suggestions gratefully accepted. :-)
> 
> 1: 
> https://github.com/openstack/neutron/blob/master/neutron/privileged/agent/linux/netlink_lib.py#L136 
> 
> 
> On 1/4/19 7:28 AM, Slawomir Kaplonski wrote:
>> Hi,
>>
>> I just found that functional tests in Neutron are failing since today 
>> or maybe yesterday. See [1]
>> I was able to reproduce it locally and it looks that it happens with 
>> oslo.privsep==1.31. With oslo.privsep==1.30.1 tests are fine.
>>
>> [1] https://bugs.launchpad.net/neutron/+bug/1810518
>>
>>>> Slawek Kaplonski
>> Senior software engineer
>> Red Hat
>>



More information about the openstack-discuss mailing list