[openstack-dev] [neutron][ovs] The way we deal with MTU

Peters, Rawlin rawlin.peters at hpe.com
Mon Jun 13 17:38:07 UTC 2016


Hi Ihar,

This reminds me of a mailing list thread from a while back about moving OVS ports between namespaces being considered harmful [1]. Do you know if that was ever resolved by the OVS folks? Or, is this MTU bug just further indication of this action being harmful?

Another comment inline.

Rawlin Peters

[1] http://lists.openstack.org/pipermail/openstack-dev/2015-February/056834.html

On  Monday, June 13, 2016 10:50 AM, Ihar Hrachyshka wrote:
> 
> Hi all,
> 
> in Mitaka, we introduced a bunch of changes to the way we handle MTU in
> Neutron/Nova, making sure that the whole instance data path, starting from
> instance internal interface, thru hybrid bridge, into the br-int; as well as
> router data path (qr) have proper MTU value set on all participating devices.
> On hypervisor side, both Nova and Neutron take part in it, setting it with ip-
> link tool based on what Neutron plugin calculates for us. So far so good.
> 
> Turns out that for OVS, it does not work as expected in regards to br-int.
> There was a bug reported lately: https://launchpad.net/bugs/1590397
> 
> Briefly, when we try to set MTU on a device that is plugged into a bridge, and
> if the bridge already has another port with lower MTU, the bridge itself
> inherits MTU from that latter port, and Linux kernel (?) does not allow to set
> MTU on the first device at all, making ip link calls ineffective.
> 
> AFAIU this behaviour is consistent with Linux bridging rules: you can’t have
> ports of different MTU plugged into the same bridge.
> 
> Now, that’s a huge problem for Neutron, because we plug ports that belong
> to different networks (and that hence may have different MTUs) into the
> same br-int bridge.
> 
> So I played with the code locally a bit and spotted that currently, we set MTU
> for router ports before we move their devices into router namespaces. And
> once the device is in a namespace, ip-link actually works. So I wrote a fix with
> a functional test that proves the point:
> https://review.openstack.org/#/c/327651/ The fix was validated by the
> reporter of the original bug and seems to fix the issue for him.
> 
> It’s suspicious that it works from inside a namespace but not when the
> device is still in the root namespace. So I reached out to Jiri Benc from our
> local Open vSwitch team, and here is a quote:
> 
> ===
> 
> "It's a bug in ovs-vswitchd. It doesn't see the interface that's in other netns
> and thus cannot enforce the correct MTU.
> 
> We'll hopefully fix it and disallow incorrect MTU setting even across
> namespaces. However, it requires significant effort and rework of ovs name
> space handling.
> 
> You should not depend on the current buggy behavior. Don't set MTU of the
> internal interfaces higher than the rest of the bridge, it's not supported.
> Hacking this around by moving the interface to a netns is exploiting of a bug.
> 
> We can certainly discuss whether this limitation could be relaxed.
> Honestly, I don't know, it's for a discussion upstream. But as of now, it's not
> supported and you should not do it.”
> 
> So basically, as long as we try to plug ports with different MTUs into the same
> bridge, we are utilizing a bug in Open vSwitch, that may break us any time.
> 
> I guess our alternatives are:
> - either redesign bridge setup for openvswitch to e.g. maintain a bridge per
> network;
> - or talk to ovs folks on whether they may support that for us.
> 

It seems like another alternative would be to always use veth devices by default rather than internal OVS ports (i.e. ovs_use_veth = True), but that would likely mean taking a large performance hit that no one will be happy about.

> I understand the former option is too scary. It opens lots of questions,
> including upgrade impact since it will obviously introduce a dataplane
> downtime. That would be a huge shift in paradigm, probably too huge to
> swallow. The latter option may not fly with vswitch folks. Any better ideas?
> 
> It’s also not clear whether we want to proceed with my immediate fix.
> Advices are welcome.
> 
> Thanks,
> Ihar
> __________________________________________________________
> ________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-
> request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


More information about the OpenStack-dev mailing list