[Openstack] missing ovs flows and extra interfaces in pike

Fabian Zimmermann dev.faz at gmail.com
Fri Oct 19 14:00:48 UTC 2018


Hi,

we see something similar. We are running ovs, vxlan - but legacy routers 
directly on hvs without any network nodes.

Currently we didnt find a way to fix or reproduce the issue. We just 
wrote a small script which calcuates which flow should be on with hv and 
if something is missing -> send an alert.

The operator will then restart the suitable ovs-agent and everything is 
working again.

We also found out, that the problem is gone as soon as we disable l2pop, 
but this is not possible if we (as you already did) switch to dvr.

So at the moment we plan to disable l2pop and move our routers back to 
some network nodes.

I would be glad if someone is able to reproduce or even better - fix the 
issue.

  Fabian

Am 19.10.18 um 15:32 schrieb Hartwig Hauschild:
> Hi,
> 
> [ I have no idea how much of the following information is necessary ]
> 
> We're running Openstack Pike, deployed with Openstack-Ansible 16.0.5.
> The system is running on a bunch of compute-nodes and three combined
> network/management-nodes, we're using OVS, DVR and VXLAN for networking.
> 
> The DVRs are set up with snat disabled, that's handled by different
> systems.
> 
> We have recently noticed that we don't have north-south-connectivity in
> a couple of qdhcp-netns and after a weeks worth of debugging it boils
> down to missing OVS-flows on br-tun that should be directing the
> northbound traffic at the node with the live snat-netns.
> 
> We also noticed that while every node has the ports for the
> qdhcp-netns that belong on the node we also have a couple of taps and
> flows for ports that are on other nodes.
> 
> To make that a bit clearer:
> If you have network A with dhcp-services F, G, H we found that the ip
> netns containing the dnsmasq for F, G, H are on nodes 1, 2, 3
> respectively, but node 1 would also have the tap-interface and flows for
> G on br-int dangeling freely without any netns.
> 
> Is there a simple explanation for this and maybe even a fix?
> 
> What we found so far seems to suggest we should either restart the
> management-nodes or the neutron-agent-containers or at least stop, clean
> and start ovs and neutron-openvswitch-agent inside the containers.
> 
> Is it possible to somehow redeploy or validate the flows from neutron to
> make sure that everything is consistent apart from restarts?
> 
> 



More information about the Openstack mailing list