[devstack][neutron][manila] Instances having trouble getting to external network with OVN
Hi, Some third party storage CI running against manila repos has had issues setting up VMs and providing access to external resources in devstack; these issues started around the time that the default ml2 plugin was switched to OVN. Folks in the manila team aren't familiar with devstack/CI networking to determine how to resolve these; and we'd totally appreciate help from networking experts in this regard. A sample local.conf file used by the CI is here: http://paste.openstack.org/show/807449/ Manila's tests: - create a private tenant network - setup a router that connects the private network to the public network - create a nova VM on the private tenant network - attempt to mount a share via NFS from within this VM The final step fails because the NFS mount command times out within the test VM: tempest.lib.exceptions.SSHExecCommandFailed: Command 'sudo mount -vt nfs "10.16.xx.xx:/share-96ae8d0c-8bab-4fcd-b278-51a6f473e03c-manila" /mnt', exit status: 32, stderr: mount.nfs: mount(2): Connection timed out mount.nfs: Connection timed out stdout: mount.nfs: timeout set for Mon Jun 28 14:47:19 2021 mount.nfs: trying text-based options 'vers=4.2,addr=10.16.xx.xx,clientaddr=10.1.0.26' The timeout seems to be because the VM is unable to reach the NFS server that's external to the devstack host. The NFS server is reachable from the devstack host. Have there been any changes to devstack configuration that we need to be aware of, wrt VMs having access to the external network? Thanks, Goutham
Hello, NetApp CI has been facing the same problem. Here is a local.conf we have been using in our CI: https://paste.openstack.org/show/807484/ The tests had basically same output described by Goutham: https://paste.openstack.org/show/807486/ I have also tried this in a development environment in our lab but the same issue is occurring. Em ter., 13 de jul. de 2021 às 18:20, Goutham Pacha Ravi < gouthampravi@gmail.com> escreveu:
Hi,
Some third party storage CI running against manila repos has had issues setting up VMs and providing access to external resources in devstack; these issues started around the time that the default ml2 plugin was switched to OVN. Folks in the manila team aren't familiar with devstack/CI networking to determine how to resolve these; and we'd totally appreciate help from networking experts in this regard.
A sample local.conf file used by the CI is here: http://paste.openstack.org/show/807449/
Manila's tests: - create a private tenant network - setup a router that connects the private network to the public network - create a nova VM on the private tenant network - attempt to mount a share via NFS from within this VM
The final step fails because the NFS mount command times out within the test VM:
tempest.lib.exceptions.SSHExecCommandFailed: Command 'sudo mount -vt nfs "10.16.xx.xx:/share-96ae8d0c-8bab-4fcd-b278-51a6f473e03c-manila" /mnt', exit status: 32, stderr: mount.nfs: mount(2): Connection timed out mount.nfs: Connection timed out
stdout: mount.nfs: timeout set for Mon Jun 28 14:47:19 2021 mount.nfs: trying text-based options 'vers=4.2,addr=10.16.xx.xx,clientaddr=10.1.0.26'
The timeout seems to be because the VM is unable to reach the NFS server that's external to the devstack host. The NFS server is reachable from the devstack host.
Have there been any changes to devstack configuration that we need to be aware of, wrt VMs having access to the external network?
Thanks, Goutham
On Wed, Jul 14, 2021 at 2:21 PM Carlos Silva <ces.eduardo98@gmail.com> wrote:
Hello,
NetApp CI has been facing the same problem.
Thank you Carlos!
Here is a local.conf we have been using in our CI: https://paste.openstack.org/show/807484/ The tests had basically same output described by Goutham: https://paste.openstack.org/show/807486/
I have also tried this in a development environment in our lab but the same issue is occurring.
Em ter., 13 de jul. de 2021 às 18:20, Goutham Pacha Ravi < gouthampravi@gmail.com> escreveu:
Hi,
Some third party storage CI running against manila repos has had issues setting up VMs and providing access to external resources in devstack; these issues started around the time that the default ml2 plugin was switched to OVN. Folks in the manila team aren't familiar with devstack/CI networking to determine how to resolve these; and we'd totally appreciate help from networking experts in this regard.
A sample local.conf file used by the CI is here: http://paste.openstack.org/show/807449/
Manila's tests: - create a private tenant network - setup a router that connects the private network to the public network - create a nova VM on the private tenant network - attempt to mount a share via NFS from within this VM
The final step fails because the NFS mount command times out within the test VM:
tempest.lib.exceptions.SSHExecCommandFailed: Command 'sudo mount -vt nfs "10.16.xx.xx:/share-96ae8d0c-8bab-4fcd-b278-51a6f473e03c-manila" /mnt', exit status: 32, stderr: mount.nfs: mount(2): Connection timed out mount.nfs: Connection timed out
stdout: mount.nfs: timeout set for Mon Jun 28 14:47:19 2021 mount.nfs: trying text-based options 'vers=4.2,addr=10.16.xx.xx,clientaddr=10.1.0.26'
The timeout seems to be because the VM is unable to reach the NFS server that's external to the devstack host. The NFS server is reachable from the devstack host.
Have there been any changes to devstack configuration that we need to be aware of, wrt VMs having access to the external network?
Thanks, Goutham
Ping Neutron developers, Can you help vet this devstack issue; folks in the manila third party CI community will be happy to share further logs or details. As an example, failure logs on a third party CI job are here: http://openstack-logs.purestorage.com/84/789384/25/thirdparty-check/pure-dev... <http://openstack-logs.purestorage.com/84/789384/25/thirdparty-check/pure-devstack-manila-tempest-aio/6965033/>
Hi, Bubbling up this issue - I reported a launchpad bug with some more debug information: https://bugs.launchpad.net/bugs/1939627 I’m confused how/why only the Manila gates are hitting this issue. If you’re reading - do you have any tests elsewhere that setup a nova instance on a devstack and ping/communicate with the internet/outside world? If yes, I’d love to compare configuration with your setup. Thank you so much for your help, Goutham On Wed, Jul 21, 2021 at 12:07 PM Goutham Pacha Ravi <gouthampravi@gmail.com> wrote:
On Wed, Jul 14, 2021 at 2:21 PM Carlos Silva <ces.eduardo98@gmail.com> wrote:
Hello,
NetApp CI has been facing the same problem.
Thank you Carlos!
Here is a local.conf we have been using in our CI: https://paste.openstack.org/show/807484/ The tests had basically same output described by Goutham: https://paste.openstack.org/show/807486/
I have also tried this in a development environment in our lab but the same issue is occurring.
Em ter., 13 de jul. de 2021 às 18:20, Goutham Pacha Ravi < gouthampravi@gmail.com> escreveu:
Hi,
Some third party storage CI running against manila repos has had issues setting up VMs and providing access to external resources in devstack; these issues started around the time that the default ml2 plugin was switched to OVN. Folks in the manila team aren't familiar with devstack/CI networking to determine how to resolve these; and we'd totally appreciate help from networking experts in this regard.
A sample local.conf file used by the CI is here: http://paste.openstack.org/show/807449/
Manila's tests: - create a private tenant network - setup a router that connects the private network to the public network - create a nova VM on the private tenant network - attempt to mount a share via NFS from within this VM
The final step fails because the NFS mount command times out within the test VM:
tempest.lib.exceptions.SSHExecCommandFailed: Command 'sudo mount -vt nfs "10.16.xx.xx:/share-96ae8d0c-8bab-4fcd-b278-51a6f473e03c-manila" /mnt', exit status: 32, stderr: mount.nfs: mount(2): Connection timed out mount.nfs: Connection timed out
stdout: mount.nfs: timeout set for Mon Jun 28 14:47:19 2021 mount.nfs: trying text-based options 'vers=4.2,addr=10.16.xx.xx,clientaddr=10.1.0.26'
The timeout seems to be because the VM is unable to reach the NFS server that's external to the devstack host. The NFS server is reachable from the devstack host.
Have there been any changes to devstack configuration that we need to be aware of, wrt VMs having access to the external network?
Thanks, Goutham
Ping Neutron developers,
Can you help vet this devstack issue; folks in the manila third party CI community will be happy to share further logs or details. As an example, failure logs on a third party CI job are here: http://openstack-logs.purestorage.com/84/789384/25/thirdparty-check/pure-dev... <http://openstack-logs.purestorage.com/84/789384/25/thirdparty-check/pure-devstack-manila-tempest-aio/6965033/>
On Tue, Aug 24, 2021, at 11:21 AM, Goutham Pacha Ravi wrote:
Hi,
Bubbling up this issue - I reported a launchpad bug with some more debug information: https://bugs.launchpad.net/bugs/1939627
I’m confused how/why only the Manila gates are hitting this issue. If you’re reading - do you have any tests elsewhere that setup a nova instance on a devstack and ping/communicate with the internet/outside world? If yes, I’d love to compare configuration with your setup.
It has been a while since I looked at this stuff and the OVN switch may have changed it, but we have historically intentionally avoided external connectivity for the nested devstack cloud in upstream CI. Instead we expect the test jobs to be self contained. On multinode jobs we set up an entire L2 network with very simple L3 routing that is independent of the host system networking with vxlan. This allows tempest to talk to the test instances on the nested cloud. But those nested cloud instances cannot get off the instance. This is important because it helps keep things like dhcp requests from leaking out into the world. Even if you configured floating IPs on the inner instances those IPs wouldn't be routable to the host instance in the clouds that we host jobs in. To make this work without a bunch of support from the hosting environment you would need to NAT between the host instances IP and inner nested devstack instances so that traffic exiting the test instances had a path to the outside world that can receive the return traffic. In https://bugs.launchpad.net/bugs/1939627 you are creating an external network and floating IPs but once that traffic leaves the host system there must be a return path for any responses, and I suspect that is what is missing? It is odd that this would coincide with the OVN change. Maybe IP ranges were updated with the OVN change and any hosting support that enabled routing of those IPs is no longer valid as a result?
Thank you so much for your help,
Goutham
On Tue, Aug 24, 2021 at 4:24 PM Clark Boylan <cboylan@sapwetik.org> wrote:
On Tue, Aug 24, 2021, at 11:21 AM, Goutham Pacha Ravi wrote:
Hi,
Bubbling up this issue - I reported a launchpad bug with some more debug information: https://bugs.launchpad.net/bugs/1939627
I’m confused how/why only the Manila gates are hitting this issue. If you’re reading - do you have any tests elsewhere that setup a nova instance on a devstack and ping/communicate with the internet/outside world? If yes, I’d love to compare configuration with your setup.
It has been a while since I looked at this stuff and the OVN switch may have changed it, but we have historically intentionally avoided external connectivity for the nested devstack cloud in upstream CI. Instead we expect the test jobs to be self contained. On multinode jobs we set up an entire L2 network with very simple L3 routing that is independent of the host system networking with vxlan. This allows tempest to talk to the test instances on the nested cloud. But those nested cloud instances cannot get off the instance. This is important because it helps keep things like dhcp requests from leaking out into the world.
Even if you configured floating IPs on the inner instances those IPs wouldn't be routable to the host instance in the clouds that we host jobs in. To make this work without a bunch of support from the hosting environment you would need to NAT between the host instances IP and inner nested devstack instances so that traffic exiting the test instances had a path to the outside world that can receive the return traffic.
Thanks Clark; the tests we're running are either outside CI - i.e., with local devstack, or on third party CI systems where the guest VMs would need to access storage that's external to the devstack. These are tempest tests that have no reason to reach out to the external world. I didn't know the reason behind this design, so this insight is useful to me!
In https://bugs.launchpad.net/bugs/1939627 you are creating an external network and floating IPs but once that traffic leaves the host system there must be a return path for any responses, and I suspect that is what is missing? It is odd that this would coincide with the OVN change. Maybe IP ranges were updated with the OVN change and any hosting support that enabled routing of those IPs is no longer valid as a result?
I was scratching my head and reading the OVN integration code as well - but neutron internals wasn't my strong suit. slaweq and ralonsoh were able to root-cause this to missing NAT rule/s in the devstack ovn-agent setup. \o/
Thank you so much for your help,
Goutham
participants (3)
-
Carlos Silva
-
Clark Boylan
-
Goutham Pacha Ravi