Thanks Ben for digging into the detail. I made some more tests based on your test script.
From my test result, pyroute2 and "ip" command operations against netns seems to work fine even if network namespaces of the process and thread are different. The test script iis http://paste.openstack.org/show/742886/ and the result is http://paste.openstack.org/show/742887/.
So, to get this test passing I think we need to change [1] so it looks for the thread id and uses a replacement for [2] that allows the thread id to be injected as above.
I confirmed network namespace operations work well, so it looks safe. Considering the situation, I proposed a change on the failing test to check a list of network devices inside a netns. https://review.openstack.org/#/c/631654/ Thanks, Akihiro Motoki (irc: amotoki) 2019年1月16日(水) 7:56 Ben Nemec <openstack@nemebean.com>:
TLDR: We now need to look at the thread namespace instead of the process namespace. Many, many details below.
On 1/15/19 11:51 AM, Ben Nemec wrote:
On 1/15/19 11:16 AM, Ben Nemec wrote:
On 1/15/19 6:49 AM, Doug Hellmann wrote:
Ben Nemec <openstack@nemebean.com> writes:
I tried to set up a test environment for this, but I'm having some issues. My local environment is defaulting to python 3, while the gate job appears to have been running under python 2. I'm not sure why it's doing that since the tox env definition doesn't specify python 3
(maybe
something to do with https://review.openstack.org/#/c/622415/ ?), but either way I keep running into import issues.
I'll take another look tomorrow, but in the meantime I'm afraid I haven't made any meaningful progress. :-(
If no version is specified in the tox.ini then tox defaults to the version of python used to install it.
Ah, good to know. I think I installed tox as just "tox" instead of "python-tox", which means I got the py3 version.
Unfortunately I'm still having trouble running the failing test (and not for the expected reason ;-). The daemon is failing to start with:
ImportError: No module named tests.functional.utils
No idea why, but updating the fwaas capabilities to match core neutron by adding c.CAP_DAC_OVERRIDE and c.CAP_DAC_READ_SEARCH made this go away. Those are related to file permission checks, but the permissions on my source tree are, well, permissive, so I'm not sure why that would be a problem.
I'm not seeing any log output from the daemon either for some reason so it's hard to debug. There must be some difference between this and the neutron test environment because in neutron I was getting daemon log output in /opt/stack/logs.
Figured this part out. tox.ini wasn't inheriting some values in the same way as neutron. Fix proposed in https://review.openstack.org/#/c/631035/
Actually, I discovered that these logs were happening, they were just in /tmp. So that change is probably not necessary, especially since it's breaking ci.
Now hopefully I can make progress on the rest of it.
And sure enough, I did. :-)
In short, we need to look at the thread-specific network namespace in this test instead of the process-specific one. When we change the namespace it only affects the thread, unless the call is made from the process's main thread. Here's a simple(?) example:
#!/usr/bin/env python
import ctypes import os import threading
from pyroute2 import netns
# The python threading identifier is useless here, # we need to make a syscall libc = ctypes.CDLL('libc.so.6')
def do_the_thing(ns): tid = libc.syscall(186) # This id varies by platform :-/ # Check the starting netns print('process %s' % os.readlink('/proc/self/ns/net')) print('thread %s' % os.readlink('/proc/self/task/%s/ns/net' % tid)) # Change the netns print('changing to %s' % ns) netns.setns(ns) # Check again. It should be different print('process %s' % os.readlink('/proc/self/ns/net')) print('thread %s\n' % os.readlink('/proc/self/task/%s/ns/net' % tid))
# Run in main thread do_the_thing('foo') # Run in new thread t = threading.Thread(target=do_the_thing, args=('bar',)) t.start() t.join() # Run in main thread again to show difference do_the_thing('bar')
# Clean up after ourselves netns.remove('foo') netns.remove('bar')
And here's the output:
process net:[4026531992] thread net:[4026531992] changing to foo process net:[4026532196] <- Running in the main thread changes both thread net:[4026532196]
process net:[4026532196] thread net:[4026532196] changing to bar process net:[4026532196] <- Child thread only changes the thread thread net:[4026532254]
process net:[4026532196] thread net:[4026532196] changing to bar process net:[4026532254] <- Main thread gets them back in sync thread net:[4026532254]
So, to get this test passing I think we need to change [1] so it looks for the thread id and uses a replacement for [2] that allows the thread id to be injected as above.
And it's the end of my day so I'm going to leave it there. :-)
1:
https://github.com/openstack/neutron-fwaas/blob/master/neutron_fwaas/privile... 2:
https://github.com/openstack/neutron-fwaas/blob/master/neutron_fwaas/privile...
-Ben