[Openstack-operators] [neutron] ML2/OVS dropping packets?

Jonathan Proulx jon at csail.mit.edu
Wed Jun 21 17:52:51 UTC 2017



So this all gets more interesting the packets aren't lost they get
routed (switched?) to the wrong interface...


The VM has two interfaces on the same network. Not sure this makes
sense and wes done because this was a straight physical to virtual
migration.  But seems like it should work

so VM is sending SYN from it's (vm)eth0 -> tap0 -> qvb0 -> qvo0 -> int-eth1-br
-> phy-eht1-br -> (hypervisor)eth1 -> WORLD

but the ACK is coming back (hypervisor)eth1 ->  phy-eht1-br ->
int-eth1-br -> qvo1 !!! -> qvb1 -> tap1 where presumablely sec-group
rules see it as invalid and drop it.

This is quite odd.  Default route on VM is through eth0 where packets
are originatine and where teh ipv4 address it should return to is.

really puzzled why OVS is sending packets back through wrong path.

on the one hand I want to say stop doing that just put both addresses
on one port, on the other I see no reason why it shouldn't work.

-Jon
 

On Wed, Jun 21, 2017 at 05:35:02PM +0100, Stig Telfer wrote:
:Hi Jon -
:
:From what I understand, while you might have gone to the trouble of configuring a lossless data centre ethernet, that guarantee of packet loss ends at the hypervisor. OVS (and other virtual switches) will drop packets rather than exert back pressure.
:
:I saw a useful paper from IBM Zurich on developing a flow-controlled virtual switch:
:
:http://researcher.ibm.com/researcher/files/zurich-DCR/Got%20Loss%20Get%20zOVN.pdf <http://researcher.ibm.com/researcher/files/zurich-DCR/Got%20Loss%20Get%20zOVN.pdf>
:
:It’s a bit dated (2013) but may still apply.
:
:If you figure out a way of preventing this with modern OVS, I’d be very interested to know.
:
:Best wishes,
:Stig
:
:
:> On 21 Jun 2017, at 16:24, Jonathan Proulx <jon at csail.mit.edu> wrote:
:> 
:> On Wed, Jun 21, 2017 at 02:39:23AM -0700, Kevin Benton wrote:
:> :Are there any events going on during these outages that would cause
:> :reprogramming by the Neutron agent? (e.g. port updates) If not, it's likely
:> :an OVS issue and you might want to cross-post to the ovs-discuss mailing
:> :list.
:> 
:> Guess I'll have to wander deeper into OVS land.
:> 
:> No agent updates and nothing in ovs logs (at INFO), flipping to Debug
:> and there's so many messages they get dropped:
:> 
:> 017-06-21T15:15:36.972Z|00794|dpif(handler12)|DBG|Dropped 35 log messages in last 0 seconds (most recently, 0 seconds ago) due to excessive rate
:> 
:> /me wanders over to ovs-discuss
:> 
:> Thanks,
:> -Jon
:> 
:> :Can you check the vswitch logs during the packet loss to see if there are
:> :any messages indicating a reason? If that doesn't show anything and it can
:> :be reliably reproduced, it might be worth increasing the logging for the
:> :vswitch to debug.
:> :
:> :
:> :
:> :On Tue, Jun 20, 2017 at 12:36 PM, Jonathan Proulx <jon at csail.mit.edu> wrote:
:> :
:> :> Hi All,
:> :>
:> :> I have a very busy VM (well one of my users does I don't have access
:> :> but do have cooperative and copentent admin to interact with on th
:> :> eother end).
:> :>
:> :> At peak times it *sometimes* misses packets.  I've been didding in for
:> :> a bit ant it looks like they get dropped in OVS land.
:> :>
:> :> The VM's main function in life is to pull down webpages from other
:> :> sites and analyze as requested.  During peak times ( EU/US working
:> :> hours ) it sometimes hangs some requests and sometimes fails.
:> :>
:> :> Looking at traffic the out bound SYN request from VM is always good
:> :> and returning ACK always gets to physical interface of the hypervisosr
:> :> (on a provider vlan).
:> :>
:> :> When packets get dropped they do not make it to the qvoXXXXXXXX-XX on
:> :> the integration bridge.
:> :>
:> :> My suspicion is that OVS isn't keeping up eth1-br flow rules remaping
:> :> from external to internal vlan-id but neither quite sure how to prove
:> :> that or what to do about it.
:> :>
:> :> My initial though had been to blame contrack but drops are happening
:> :> before the iptables rules and while there's a lot of connections on
:> :> this hypervisor:
:> :>
:> :> net.netfilter.nf_conntrack_count = 351880
:> :>
:> :> There should be plent of overhead to handle:
:> :>
:> :> net.netfilter.nf_conntrack_max = 1048576
:> :>
:> :> Anyone have thought son where to go with this?
:> :>
:> :> version details:
:> :> Ubuntu 14.04
:> :> OpenStack Mitaka
:> :> ovs-vsctl (Open vSwitch) 2.5.0
:> :>
:> :> Thanks,
:> :> -Jon
:> :>
:> :> --
:> :>
:> :> _______________________________________________
:> :> OpenStack-operators mailing list
:> :> OpenStack-operators at lists.openstack.org
:> :> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
:> :>
:> 
:> -- 
:> 
:> _______________________________________________
:> OpenStack-operators mailing list
:> OpenStack-operators at lists.openstack.org
:> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
:

-- 



More information about the OpenStack-operators mailing list