[oslo][neutron] Neutron Functional Test Failures with oslo.privsep 1.31.0

Akihiro Motoki amotoki at gmail.com
Thu Jan 17 21:45:37 UTC 2019


Thanks Ben for digging into the detail.

I made some more tests based on your test script.
>From my test result, pyroute2 and "ip" command operations against netns
seems to work fine even if network namespaces of the process and thread are
different.
The test script iis http://paste.openstack.org/show/742886/ and the result
is http://paste.openstack.org/show/742887/.

> So, to get this test passing I think we need to change [1] so it looks
> for the thread id and uses a replacement for [2] that allows the thread
> id to be injected as above.

I confirmed network namespace operations work well, so it looks safe.
Considering the situation, I proposed a change on the failing test to check
a list of network devices inside a netns.
https://review.openstack.org/#/c/631654/

Thanks,
Akihiro Motoki (irc: amotoki)

2019年1月16日(水) 7:56 Ben Nemec <openstack at nemebean.com>:

> TLDR: We now need to look at the thread namespace instead of the process
> namespace. Many, many details below.
>
> On 1/15/19 11:51 AM, Ben Nemec wrote:
> >
> >
> > On 1/15/19 11:16 AM, Ben Nemec wrote:
> >>
> >>
> >> On 1/15/19 6:49 AM, Doug Hellmann wrote:
> >>> Ben Nemec <openstack at nemebean.com> writes:
> >>>
> >>>> I tried to set up a test environment for this, but I'm having some
> >>>> issues. My local environment is defaulting to python 3, while the gate
> >>>> job appears to have been running under python 2. I'm not sure why it's
> >>>> doing that since the tox env definition doesn't specify python 3
> (maybe
> >>>> something to do with https://review.openstack.org/#/c/622415/ ?), but
> >>>> either way I keep running into import issues.
> >>>>
> >>>> I'll take another look tomorrow, but in the meantime I'm afraid I
> >>>> haven't made any meaningful progress. :-(
> >>>
> >>> If no version is specified in the tox.ini then tox defaults to the
> >>> version of python used to install it.
> >>>
> >>
> >> Ah, good to know. I think I installed tox as just "tox" instead of
> >> "python-tox", which means I got the py3 version.
> >>
> >> Unfortunately I'm still having trouble running the failing test (and
> >> not for the expected reason ;-). The daemon is failing to start with:
> >>
> >> ImportError: No module named tests.functional.utils
>
> No idea why, but updating the fwaas capabilities to match core neutron
> by adding c.CAP_DAC_OVERRIDE and c.CAP_DAC_READ_SEARCH made this go
> away. Those are related to file permission checks, but the permissions
> on my source tree are, well, permissive, so I'm not sure why that would
> be a problem.
>
> >>
> >> I'm not seeing any log output from the daemon either for some reason
> >> so it's hard to debug. There must be some difference between this and
> >> the neutron test environment because in neutron I was getting daemon
> >> log output in /opt/stack/logs.
> >
> > Figured this part out. tox.ini wasn't inheriting some values in the same
> > way as neutron. Fix proposed in https://review.openstack.org/#/c/631035/
>
> Actually, I discovered that these logs were happening, they were just in
> /tmp. So that change is probably not necessary, especially since it's
> breaking ci.
>
> >
> > Now hopefully I can make progress on the rest of it.
>
> And sure enough, I did. :-)
>
> In short, we need to look at the thread-specific network namespace in
> this test instead of the process-specific one. When we change the
> namespace it only affects the thread, unless the call is made from the
> process's main thread. Here's a simple(?) example:
>
> #!/usr/bin/env python
>
> import ctypes
> import os
> import threading
>
> from pyroute2 import netns
>
> # The python threading identifier is useless here,
> # we need to make a syscall
> libc = ctypes.CDLL('libc.so.6')
>
> def do_the_thing(ns):
>      tid = libc.syscall(186) # This id varies by platform :-/
>      # Check the starting netns
>      print('process %s' % os.readlink('/proc/self/ns/net'))
>      print('thread %s' % os.readlink('/proc/self/task/%s/ns/net' % tid))
>      # Change the netns
>      print('changing to %s' % ns)
>      netns.setns(ns)
>      # Check again. It should be different
>      print('process %s' % os.readlink('/proc/self/ns/net'))
>      print('thread %s\n' % os.readlink('/proc/self/task/%s/ns/net' % tid))
>
> # Run in main thread
> do_the_thing('foo')
> # Run in new thread
> t = threading.Thread(target=do_the_thing, args=('bar',))
> t.start()
> t.join()
> # Run in main thread again to show difference
> do_the_thing('bar')
>
> # Clean up after ourselves
> netns.remove('foo')
> netns.remove('bar')
>
> And here's the output:
>
> process net:[4026531992]
> thread net:[4026531992]
> changing to foo
> process net:[4026532196] <- Running in the main thread changes both
> thread net:[4026532196]
>
> process net:[4026532196]
> thread net:[4026532196]
> changing to bar
> process net:[4026532196] <- Child thread only changes the thread
> thread net:[4026532254]
>
> process net:[4026532196]
> thread net:[4026532196]
> changing to bar
> process net:[4026532254] <- Main thread gets them back in sync
> thread net:[4026532254]
>
> So, to get this test passing I think we need to change [1] so it looks
> for the thread id and uses a replacement for [2] that allows the thread
> id to be injected as above.
>
> And it's the end of my day so I'm going to leave it there. :-)
>
> 1:
>
> https://github.com/openstack/neutron-fwaas/blob/master/neutron_fwaas/privileged/tests/functional/utils.py#L23
> 2:
>
> https://github.com/openstack/neutron-fwaas/blob/master/neutron_fwaas/privileged/utils.py#L25
>
> -Ben
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-discuss/attachments/20190118/4c4f7e98/attachment-0001.html>


More information about the openstack-discuss mailing list