[Openstack-operators] [neutron] ML2/OVS dropping packets?

Kevin Benton kevin at benton.pub
Thu Jun 22 00:07:00 UTC 2017


Can you do a tcpdump to see if the VM is sending any packets out that other
interface with the source mac of the primary interface?

We make use of the NORMAL action which does mac learning so it's possible
something is slipping through that is causing OVS to get the wrong port
association.

The other possibility is that the ARP entry on the upstream router is
learning the secondary MAC for the IP of the primary due to some traffic
slipping out.

Seeing the dest MAC of upstream traffic going into the filtering bridge of
the secondary interface should tell you if it's a Mac learning problem or
an arp problem.


On Jun 21, 2017 10:52, "Jonathan Proulx" <jon at csail.mit.edu> wrote:



So this all gets more interesting the packets aren't lost they get
routed (switched?) to the wrong interface...


The VM has two interfaces on the same network. Not sure this makes
sense and wes done because this was a straight physical to virtual
migration.  But seems like it should work

so VM is sending SYN from it's (vm)eth0 -> tap0 -> qvb0 -> qvo0 ->
int-eth1-br
-> phy-eht1-br -> (hypervisor)eth1 -> WORLD

but the ACK is coming back (hypervisor)eth1 ->  phy-eht1-br ->
int-eth1-br -> qvo1 !!! -> qvb1 -> tap1 where presumablely sec-group
rules see it as invalid and drop it.

This is quite odd.  Default route on VM is through eth0 where packets
are originatine and where teh ipv4 address it should return to is.

really puzzled why OVS is sending packets back through wrong path.

on the one hand I want to say stop doing that just put both addresses
on one port, on the other I see no reason why it shouldn't work.

-Jon


On Wed, Jun 21, 2017 at 05:35:02PM +0100, Stig Telfer wrote:
:Hi Jon -
:
:From what I understand, while you might have gone to the trouble of
configuring a lossless data centre ethernet, that guarantee of packet loss
ends at the hypervisor. OVS (and other virtual switches) will drop packets
rather than exert back pressure.
:
:I saw a useful paper from IBM Zurich on developing a flow-controlled
virtual switch:
:
:http://researcher.ibm.com/researcher/files/zurich-DCR/
Got%20Loss%20Get%20zOVN.pdf <http://researcher.ibm.com/
researcher/files/zurich-DCR/Got%20Loss%20Get%20zOVN.pdf>
:
:It’s a bit dated (2013) but may still apply.
:
:If you figure out a way of preventing this with modern OVS, I’d be very
interested to know.
:
:Best wishes,
:Stig
:
:
:> On 21 Jun 2017, at 16:24, Jonathan Proulx <jon at csail.mit.edu> wrote:
:>
:> On Wed, Jun 21, 2017 at 02:39:23AM -0700, Kevin Benton wrote:
:> :Are there any events going on during these outages that would cause
:> :reprogramming by the Neutron agent? (e.g. port updates) If not, it's
likely
:> :an OVS issue and you might want to cross-post to the ovs-discuss mailing
:> :list.
:>
:> Guess I'll have to wander deeper into OVS land.
:>
:> No agent updates and nothing in ovs logs (at INFO), flipping to Debug
:> and there's so many messages they get dropped:
:>
:> 017-06-21T15:15:36.972Z|00794|dpif(handler12)|DBG|Dropped 35 log
messages in last 0 seconds (most recently, 0 seconds ago) due to excessive
rate
:>
:> /me wanders over to ovs-discuss
:>
:> Thanks,
:> -Jon
:>
:> :Can you check the vswitch logs during the packet loss to see if there
are
:> :any messages indicating a reason? If that doesn't show anything and it
can
:> :be reliably reproduced, it might be worth increasing the logging for the
:> :vswitch to debug.
:> :
:> :
:> :
:> :On Tue, Jun 20, 2017 at 12:36 PM, Jonathan Proulx <jon at csail.mit.edu>
wrote:
:> :
:> :> Hi All,
:> :>
:> :> I have a very busy VM (well one of my users does I don't have access
:> :> but do have cooperative and copentent admin to interact with on th
:> :> eother end).
:> :>
:> :> At peak times it *sometimes* misses packets.  I've been didding in for
:> :> a bit ant it looks like they get dropped in OVS land.
:> :>
:> :> The VM's main function in life is to pull down webpages from other
:> :> sites and analyze as requested.  During peak times ( EU/US working
:> :> hours ) it sometimes hangs some requests and sometimes fails.
:> :>
:> :> Looking at traffic the out bound SYN request from VM is always good
:> :> and returning ACK always gets to physical interface of the hypervisosr
:> :> (on a provider vlan).
:> :>
:> :> When packets get dropped they do not make it to the qvoXXXXXXXX-XX on
:> :> the integration bridge.
:> :>
:> :> My suspicion is that OVS isn't keeping up eth1-br flow rules remaping
:> :> from external to internal vlan-id but neither quite sure how to prove
:> :> that or what to do about it.
:> :>
:> :> My initial though had been to blame contrack but drops are happening
:> :> before the iptables rules and while there's a lot of connections on
:> :> this hypervisor:
:> :>
:> :> net.netfilter.nf_conntrack_count = 351880
:> :>
:> :> There should be plent of overhead to handle:
:> :>
:> :> net.netfilter.nf_conntrack_max = 1048576
:> :>
:> :> Anyone have thought son where to go with this?
:> :>
:> :> version details:
:> :> Ubuntu 14.04
:> :> OpenStack Mitaka
:> :> ovs-vsctl (Open vSwitch) 2.5.0
:> :>
:> :> Thanks,
:> :> -Jon
:> :>
:> :> --
:> :>
:> :> _______________________________________________
:> :> OpenStack-operators mailing list
:> :> OpenStack-operators at lists.openstack.org
:> :> http://lists.openstack.org/cgi-bin/mailman/listinfo/
openstack-operators
:> :>
:>
:> --
:>
:> _______________________________________________
:> OpenStack-operators mailing list
:> OpenStack-operators at lists.openstack.org
:> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
:

--
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-operators/attachments/20170621/7accb47e/attachment.html>


More information about the OpenStack-operators mailing list