[oslo][neutron] Neutron Functional Test Failures with oslo.privsep 1.31.0
Ben Nemec
openstack at nemebean.com
Tue Jan 8 19:04:58 UTC 2019
Further update: I dusted off my gdb skills and attached it to the
privsep process to try to get more details about exactly what is
crashing. It looks like the segfault happens on this line:
https://git.netfilter.org/libnetfilter_conntrack/tree/src/conntrack/api.c#n239
which is
h->cb = cb;
h being the conntrack handle and cb being the callback function.
This makes me think the problem isn't the callback itself (even if we
assigned a bogus pointer, which we didn't, it shouldn't cause a segfault
unless you try to dereference it) but in the handle we pass in. Trying
to look at h->cb results in:
(gdb) print h->cb
Cannot access memory at address 0x800f228
Interestingly, h itself is fine:
(gdb) print h
$3 = (struct nfct_handle *) 0x800f1e0
It doesn't _look_ to me like the handle should be crossing any thread
boundaries or anything, so I'm not sure why it would be a problem. It
gets created in the same privileged function that ultimately registers
the callback:
https://github.com/openstack/neutron/blob/aa8a6ea848aae6882abb631b7089836dee8f4008/neutron/privileged/agent/linux/netlink_lib.py#L246
So still not sure what's going on, but I thought I'd share what I've
found before I stop to eat something.
-Ben
On 1/7/19 12:11 PM, Ben Nemec wrote:
> Renamed the thread to be more descriptive.
>
> Just to update the list on this, it looks like the problem is a segfault
> when the netlink_lib module makes a C call. Digging into that code a
> bit, it appears there is a callback being used[1]. I've seen some
> comments that when you use a callback with a Python thread, the thread
> needs to be registered somehow, but this is all uncharted territory for
> me. Suggestions gratefully accepted. :-)
>
> 1:
> https://github.com/openstack/neutron/blob/master/neutron/privileged/agent/linux/netlink_lib.py#L136
>
>
> On 1/4/19 7:28 AM, Slawomir Kaplonski wrote:
>> Hi,
>>
>> I just found that functional tests in Neutron are failing since today
>> or maybe yesterday. See [1]
>> I was able to reproduce it locally and it looks that it happens with
>> oslo.privsep==1.31. With oslo.privsep==1.30.1 tests are fine.
>>
>> [1] https://bugs.launchpad.net/neutron/+bug/1810518
>>
>> —
>> Slawek Kaplonski
>> Senior software engineer
>> Red Hat
>>
More information about the openstack-discuss
mailing list