Thanks Ben for digging into the detail.

I made some more tests based on your test script.
From my test result, pyroute2 and "ip" command operations against netns seems to work fine even if network namespaces of the process and thread are different.

> So, to get this test passing I think we need to change [1] so it looks 
> for the thread id and uses a replacement for [2] that allows the thread 
> id to be injected as above.

I confirmed network namespace operations work well, so it looks safe.
Considering the situation, I proposed a change on the failing test to check
a list of network devices inside a netns.

Thanks,
Akihiro Motoki (irc: amotoki)

2019年1月16日(水) 7:56 Ben Nemec <openstack@nemebean.com>:
TLDR: We now need to look at the thread namespace instead of the process
namespace. Many, many details below.

On 1/15/19 11:51 AM, Ben Nemec wrote:
>
>
> On 1/15/19 11:16 AM, Ben Nemec wrote:
>>
>>
>> On 1/15/19 6:49 AM, Doug Hellmann wrote:
>>> Ben Nemec <openstack@nemebean.com> writes:
>>>
>>>> I tried to set up a test environment for this, but I'm having some
>>>> issues. My local environment is defaulting to python 3, while the gate
>>>> job appears to have been running under python 2. I'm not sure why it's
>>>> doing that since the tox env definition doesn't specify python 3 (maybe
>>>> something to do with https://review.openstack.org/#/c/622415/ ?), but
>>>> either way I keep running into import issues.
>>>>
>>>> I'll take another look tomorrow, but in the meantime I'm afraid I
>>>> haven't made any meaningful progress. :-(
>>>
>>> If no version is specified in the tox.ini then tox defaults to the
>>> version of python used to install it.
>>>
>>
>> Ah, good to know. I think I installed tox as just "tox" instead of
>> "python-tox", which means I got the py3 version.
>>
>> Unfortunately I'm still having trouble running the failing test (and
>> not for the expected reason ;-). The daemon is failing to start with:
>>
>> ImportError: No module named tests.functional.utils

No idea why, but updating the fwaas capabilities to match core neutron
by adding c.CAP_DAC_OVERRIDE and c.CAP_DAC_READ_SEARCH made this go
away. Those are related to file permission checks, but the permissions
on my source tree are, well, permissive, so I'm not sure why that would
be a problem.

>>
>> I'm not seeing any log output from the daemon either for some reason
>> so it's hard to debug. There must be some difference between this and
>> the neutron test environment because in neutron I was getting daemon
>> log output in /opt/stack/logs.
>
> Figured this part out. tox.ini wasn't inheriting some values in the same
> way as neutron. Fix proposed in https://review.openstack.org/#/c/631035/

Actually, I discovered that these logs were happening, they were just in
/tmp. So that change is probably not necessary, especially since it's
breaking ci.

>
> Now hopefully I can make progress on the rest of it.

And sure enough, I did. :-)

In short, we need to look at the thread-specific network namespace in
this test instead of the process-specific one. When we change the
namespace it only affects the thread, unless the call is made from the
process's main thread. Here's a simple(?) example:

#!/usr/bin/env python

import ctypes
import os
import threading

from pyroute2 import netns

# The python threading identifier is useless here,
# we need to make a syscall
libc = ctypes.CDLL('libc.so.6')

def do_the_thing(ns):
     tid = libc.syscall(186) # This id varies by platform :-/
     # Check the starting netns
     print('process %s' % os.readlink('/proc/self/ns/net'))
     print('thread %s' % os.readlink('/proc/self/task/%s/ns/net' % tid))
     # Change the netns
     print('changing to %s' % ns)
     netns.setns(ns)
     # Check again. It should be different
     print('process %s' % os.readlink('/proc/self/ns/net'))
     print('thread %s\n' % os.readlink('/proc/self/task/%s/ns/net' % tid))

# Run in main thread
do_the_thing('foo')
# Run in new thread
t = threading.Thread(target=do_the_thing, args=('bar',))
t.start()
t.join()
# Run in main thread again to show difference
do_the_thing('bar')

# Clean up after ourselves
netns.remove('foo')
netns.remove('bar')

And here's the output:

process net:[4026531992]
thread net:[4026531992]
changing to foo
process net:[4026532196] <- Running in the main thread changes both
thread net:[4026532196]

process net:[4026532196]
thread net:[4026532196]
changing to bar
process net:[4026532196] <- Child thread only changes the thread
thread net:[4026532254]

process net:[4026532196]
thread net:[4026532196]
changing to bar
process net:[4026532254] <- Main thread gets them back in sync
thread net:[4026532254]

So, to get this test passing I think we need to change [1] so it looks
for the thread id and uses a replacement for [2] that allows the thread
id to be injected as above.

And it's the end of my day so I'm going to leave it there. :-)

1:
https://github.com/openstack/neutron-fwaas/blob/master/neutron_fwaas/privileged/tests/functional/utils.py#L23
2:
https://github.com/openstack/neutron-fwaas/blob/master/neutron_fwaas/privileged/utils.py#L25

-Ben