[openstack-dev] [neutron][ovs] The way we deal with MTU

Eugene Nikanorov enikanorov at mirantis.com
Mon Jun 13 17:43:00 UTC 2016


That's interesting.


In our deployments we do something like br-ex (linux bridge, mtu 9000) -
OVSIntPort (mtu 65000) - br-floating (ovs bridge, mtu 1500) - br-int (ovs
bridge, mtu 1500).
qgs then are getting created in br-int, traffic goes all the way and that
altogether allows jumbo frames over external network.

For that reason I thought that mtu inside OVS doesn't really matter.
This, however is for ovs 2.4.1

I wonder if that behavior has changed and if the description is available
anywhere.

Thanks,
Eugene.

On Mon, Jun 13, 2016 at 9:49 AM, Ihar Hrachyshka <ihrachys at redhat.com>
wrote:

> Hi all,
>
> in Mitaka, we introduced a bunch of changes to the way we handle MTU in
> Neutron/Nova, making sure that the whole instance data path, starting from
> instance internal interface, thru hybrid bridge, into the br-int; as well
> as router data path (qr) have proper MTU value set on all participating
> devices. On hypervisor side, both Nova and Neutron take part in it, setting
> it with ip-link tool based on what Neutron plugin calculates for us. So far
> so good.
>
> Turns out that for OVS, it does not work as expected in regards to br-int.
> There was a bug reported lately: https://launchpad.net/bugs/1590397
>
> Briefly, when we try to set MTU on a device that is plugged into a bridge,
> and if the bridge already has another port with lower MTU, the bridge
> itself inherits MTU from that latter port, and Linux kernel (?) does not
> allow to set MTU on the first device at all, making ip link calls
> ineffective.
>
> AFAIU this behaviour is consistent with Linux bridging rules: you can’t
> have ports of different MTU plugged into the same bridge.
>
> Now, that’s a huge problem for Neutron, because we plug ports that belong
> to different networks (and that hence may have different MTUs) into the
> same br-int bridge.
>
> So I played with the code locally a bit and spotted that currently, we set
> MTU for router ports before we move their devices into router namespaces.
> And once the device is in a namespace, ip-link actually works. So I wrote a
> fix with a functional test that proves the point:
> https://review.openstack.org/#/c/327651/ The fix was validated by the
> reporter of the original bug and seems to fix the issue for him.
>
> It’s suspicious that it works from inside a namespace but not when the
> device is still in the root namespace. So I reached out to Jiri Benc from
> our local Open vSwitch team, and here is a quote:
>
> ===
>
> "It's a bug in ovs-vswitchd. It doesn't see the interface that's in
> other netns and thus cannot enforce the correct MTU.
>
> We'll hopefully fix it and disallow incorrect MTU setting even across
> namespaces. However, it requires significant effort and rework of ovs
> name space handling.
>
> You should not depend on the current buggy behavior. Don't set MTU of
> the internal interfaces higher than the rest of the bridge, it's not
> supported. Hacking this around by moving the interface to a netns is
> exploiting of a bug.
>
> We can certainly discuss whether this limitation could be relaxed.
> Honestly, I don't know, it's for a discussion upstream. But as of now,
> it's not supported and you should not do it.”
>
> So basically, as long as we try to plug ports with different MTUs into the
> same bridge, we are utilizing a bug in Open vSwitch, that may break us any
> time.
>
> I guess our alternatives are:
> - either redesign bridge setup for openvswitch to e.g. maintain a bridge
> per network;
> - or talk to ovs folks on whether they may support that for us.
>
> I understand the former option is too scary. It opens lots of questions,
> including upgrade impact since it will obviously introduce a dataplane
> downtime. That would be a huge shift in paradigm, probably too huge to
> swallow. The latter option may not fly with vswitch folks. Any better ideas?
>
> It’s also not clear whether we want to proceed with my immediate fix.
> Advices are welcome.
>
> Thanks,
> Ihar
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20160613/b516b6ab/attachment.html>


More information about the OpenStack-dev mailing list