[neutron][largescale-sig] Debugging and tracking missing flows with l2pop

Krzysztof Klimonda kklimonda at syntaxhighlighted.com
Wed Mar 11 13:29:58 UTC 2020


Hi,

(This is stein deployment with 14.0.2 neutron release)

I’ve just spent some time debugging a missing connection between two VMs running on OS stein with ovs+l2pop enabled and the direct cause was missing flows in table 20 and a very incomplete flood flow in table 22. Restarting neutron-openvswitch-agent on that host has fixed the issue.

Last time we’ve encountered missing flood flows (in another pike-based deployment), we tracked it down to https://review.opendev.org/#/c/600151/ and since then it was stable. 

My initial thought was that we were hitting the same bug - a couple of VMs are scheduled on the same compute, 3 ports are activated at the same time, and the flood entry is not broadcasted to other computes. However that issue was only affecting one of the computes, and it was the only one missing both MAC entries in table 20 and VXLAN tunnels in table 22.

The only other idea I have is that the compute with missing flows have not received them from rabbitmq, but there I see nothing in logs that would suggest that agent was disconnected from rabbitmq. 

So at this point I have three questions:

- what would be a good place to look next to track down those missing flows
- for other operators, how stable do you find l2pop in general? and if you have problems with missing flows in your environment, do you try to monitor your deployment for that?

-Chris


More information about the openstack-discuss mailing list