[oslo][neutron] Neutron Functional Test Failures with oslo.privsep 1.31.0

Ben Nemec openstack at nemebean.com
Mon Jan 14 23:10:22 UTC 2019


I tried to set up a test environment for this, but I'm having some 
issues. My local environment is defaulting to python 3, while the gate 
job appears to have been running under python 2. I'm not sure why it's 
doing that since the tox env definition doesn't specify python 3 (maybe 
something to do with https://review.openstack.org/#/c/622415/ ?), but 
either way I keep running into import issues.

I'll take another look tomorrow, but in the meantime I'm afraid I 
haven't made any meaningful progress. :-(

On 1/14/19 6:36 AM, Akihiro Motoki wrote:
> The similar failure happens in neutron-fwaas. This blocks several 
> patches in neutron-fwaas including policy-in-code support.
> https://bugs.launchpad.net/neutron/+bug/1811506
> 
> Most failures are fixed by applying Ben's neutron fix 
> https://review.openstack.org/#/c/629335/ [1],
> but we still have one failure 
> in neutron_fwaas.tests.functional.privileged.test_utils.InNamespaceTest.test_in_namespace 
> [2].
> This failure is caused by oslo.privsep 1.31.0 too. This does not happen 
> with 1.30.1.
> Any help would be appreciated.
> 
> [1] neutron-fwaas change https://review.openstack.org/#/c/630451/
> [2] 
> http://logs.openstack.org/51/630451/2/check/legacy-neutron-fwaas-dsvm-functional/05b9131/logs/testr_results.html.gz
> 
> -- 
> Akihiro Motoki (irc: amotoki)
> 
> 
> 2019年1月9日(水) 9:32 Ben Nemec <openstack at nemebean.com 
> <mailto:openstack at nemebean.com>>:
> 
>     I think I've got it. At least in my local tests, the handle pointer
>     being passed from C -> Python -> C was getting truncated at the Python
>     step because we didn't properly define the type. If the address
>     assigned
>     was larger than would fit in a standard int then we passed what
>     amounted
>     to a bogus pointer back to the C code, which caused the segfault.
> 
>     I have no idea why privsep threading would have exposed this, other
>     than
>     maybe running in threads affected the address space somehow?
> 
>     In any case, https://review.openstack.org/629335 has got these
>     functional tests working for me locally in oslo.privsep 1.31.0. It
>     would
>     be great if somebody could try them out and verify that I didn't just
>     find a solution that somehow only works on my system. :-)
> 
>     -Ben
> 
>     On 1/8/19 4:30 PM, Ben Nemec wrote:
>      >
>      >
>      > On 1/8/19 2:22 PM, Slawomir Kaplonski wrote:
>      >> Hi Ben,
>      >>
>      >> I was also looking at it today. I’m totally not an C and
>     Oslo.privsep
>      >> expert but I think that there is some new process spawned here.
>      >> I put pdb before line
>      >>
>     https://github.com/openstack/neutron/blob/master/neutron/privileged/agent/linux/netlink_lib.py#L191
> 
>      >> where this issue happen. Then, with "ps aux” I saw:
>      >>
>      >> vagrant at fullstack-ubuntu ~ $ ps aux | grep privsep
>      >> root     18368  0.1  0.5 185752 33544 pts/1    Sl+  13:24   0:00
>      >> /opt/stack/neutron/.tox/dsvm-functional/bin/python
>      >> /opt/stack/neutron/.tox/dsvm-functional/bin/privsep-helper
>      >> --config-file neutron/tests/etc/neutron.conf --privsep_context
>      >> neutron.privileged.default --privsep_sock_path
>      >> /tmp/tmpG5iqb9/tmp1dMGq0/privsep.sock
>      >> vagrant  18555  0.0  0.0  14512  1092 pts/2    S+   13:25   0:00
>     grep
>      >> --color=auto privsep
>      >>
>      >> But then when I continue run test, and it segfaulted, in journal
>     log I
>      >> have:
>      >>
>      >> Jan 08 13:25:29 fullstack-ubuntu kernel: privsep-helper[18369]
>      >> segfault at 140043e8 ip 00007f8e1800ef32 sp 00007f8e18a63320
>     error 4
>      >> in libnetfilter_conntrack.so.3.5.0[7f8e18009000+1a000]
>      >>
>      >> Please check pics of those processes. First one (when test was
>      >> „paused” with pdb) has 18368 and later segfault has 18369.
>      >
>      > privsep-helper does fork, so I _think_ that's normal.
>      >
>      >
>     https://github.com/openstack/oslo.privsep/blob/ecb1870c29b760f09fb933fc8ebb3eac29ffd03e/oslo_privsep/daemon.py#L539
> 
>      >
>      >
>      >>
>      >> I don’t know if You saw my today’s comment in launchpad. I was
>     trying
>      >> to change method used to start PrivsepDaemon from
>     Method.ROOTWRAP to
>      >> Method.FORK (in
>      >>
>     https://github.com/openstack/oslo.privsep/blob/master/oslo_privsep/priv_context.py#L218)
> 
>      >> and run test as root, then tests were passed.
>      >
>      > Yeah, I saw that, but I don't understand it. :-/
>      >
>      > The daemon should end up running with the same capabilities in
>     either
>      > case. By the time it starts making the C calls the environment
>     should be
>      > identical, regardless of which method was used to start the process.
>      >
>      >>
>      >> —
>      >> Slawek Kaplonski
>      >> Senior software engineer
>      >> Red Hat
>      >>
>      >>> Wiadomość napisana przez Ben Nemec <openstack at nemebean.com
>     <mailto:openstack at nemebean.com>> w dniu
>      >>> 08.01.2019, o godz. 20:04:
>      >>>
>      >>> Further update: I dusted off my gdb skills and attached it to the
>      >>> privsep process to try to get more details about exactly what is
>      >>> crashing. It looks like the segfault happens on this line:
>      >>>
>      >>>
>     https://git.netfilter.org/libnetfilter_conntrack/tree/src/conntrack/api.c#n239
> 
>      >>>
>      >>>
>      >>> which is
>      >>>
>      >>> h->cb = cb;
>      >>>
>      >>> h being the conntrack handle and cb being the callback function.
>      >>>
>      >>> This makes me think the problem isn't the callback itself (even
>     if we
>      >>> assigned a bogus pointer, which we didn't, it shouldn't cause a
>      >>> segfault unless you try to dereference it) but in the handle we
>     pass
>      >>> in. Trying to look at h->cb results in:
>      >>>
>      >>> (gdb) print h->cb
>      >>> Cannot access memory at address 0x800f228
>      >>>
>      >>> Interestingly, h itself is fine:
>      >>>
>      >>> (gdb) print h
>      >>> $3 = (struct nfct_handle *) 0x800f1e0
>      >>>
>      >>> It doesn't _look_ to me like the handle should be crossing any
>     thread
>      >>> boundaries or anything, so I'm not sure why it would be a
>     problem. It
>      >>> gets created in the same privileged function that ultimately
>      >>> registers the callback:
>      >>>
>     https://github.com/openstack/neutron/blob/aa8a6ea848aae6882abb631b7089836dee8f4008/neutron/privileged/agent/linux/netlink_lib.py#L246
> 
>      >>>
>      >>>
>      >>> So still not sure what's going on, but I thought I'd share what
>     I've
>      >>> found before I stop to eat something.
>      >>>
>      >>> -Ben
>      >>>
>      >>> On 1/7/19 12:11 PM, Ben Nemec wrote:
>      >>>> Renamed the thread to be more descriptive.
>      >>>> Just to update the list on this, it looks like the problem is a
>      >>>> segfault when the netlink_lib module makes a C call. Digging into
>      >>>> that code a bit, it appears there is a callback being used[1].
>     I've
>      >>>> seen some comments that when you use a callback with a Python
>      >>>> thread, the thread needs to be registered somehow, but this is
>     all
>      >>>> uncharted territory for me. Suggestions gratefully accepted. :-)
>      >>>> 1:
>      >>>>
>     https://github.com/openstack/neutron/blob/master/neutron/privileged/agent/linux/netlink_lib.py#L136
> 
>      >>>> On 1/4/19 7:28 AM, Slawomir Kaplonski wrote:
>      >>>>> Hi,
>      >>>>>
>      >>>>> I just found that functional tests in Neutron are failing since
>      >>>>> today or maybe yesterday. See [1]
>      >>>>> I was able to reproduce it locally and it looks that it happens
>      >>>>> with oslo.privsep==1.31. With oslo.privsep==1.30.1 tests are
>     fine.
>      >>>>>
>      >>>>> [1] https://bugs.launchpad.net/neutron/+bug/1810518
>      >>>>>
>      >>>>> —
>      >>>>> Slawek Kaplonski
>      >>>>> Senior software engineer
>      >>>>> Red Hat
>      >>>>>
>      >>
>      >
> 



More information about the openstack-discuss mailing list