[openstack-dev] [neutron][ovs] The way we deal with MTU

Ihar Hrachyshka ihrachys at redhat.com
Mon Jun 20 12:21:51 UTC 2016


> On 15 Jun 2016, at 17:27, Ihar Hrachyshka <ihrachys at redhat.com> wrote:
> 
> First, some context: we talked it thru with Eugene on IRC, and Eugene reported that he cannot reproduce the issue on his setup using Ubuntu hypervisor with ovs 2.4:
> 
> http://eavesdrop.openstack.org/irclogs/%23openstack-neutron/%23openstack-neutron.2016-06-13.log.html#t2016-06-13T19:45:22
> 
> So I went and did some testing with the functional test I have implemented. I validated the following setups:
> 
> - ubuntu 14.04 + ovs 2.0.x
> - centos 7 + ovs 2.4
> - centos 7 + ovs 2.5
> 
> All of them fail to pass the test. I also pushed the test without the fix into gate, and it failed too:
> 
> https://review.openstack.org/329558
> 
> So we definitely have some sort of issue that is independent of underlying distribution or Open vSwitch.
> 
> With that, I believe we should go forward with the fix as a short term solution: https://review.openstack.org/327651 (I removed WIP from it.)

The patch landed in master, and I seek to backport it to Liberty/Mitaka (backports proposed).

> 
> I will also reach ovs developers on the matter to see if they can somehow allow us to disable the mtu curtailing, and still stay supported.

I dropped an email to dev at openvswitch.org just now: http://openvswitch.org/pipermail/dev/2016-June/073190.html to seek their guidance.

> 
> Ihar
> 
>> On 13 Jun 2016, at 19:43, Eugene Nikanorov <enikanorov at mirantis.com> wrote:
>> 
>> That's interesting.
>> 
>> 
>> In our deployments we do something like br-ex (linux bridge, mtu 9000) - OVSIntPort (mtu 65000) - br-floating (ovs bridge, mtu 1500) - br-int (ovs bridge, mtu 1500).
>> qgs then are getting created in br-int, traffic goes all the way and that altogether allows jumbo frames over external network.
>> 
>> For that reason I thought that mtu inside OVS doesn't really matter. 
>> This, however is for ovs 2.4.1
>> 
>> I wonder if that behavior has changed and if the description is available anywhere.
>> 
>> Thanks,
>> Eugene.
>> 
>> On Mon, Jun 13, 2016 at 9:49 AM, Ihar Hrachyshka <ihrachys at redhat.com> wrote:
>> Hi all,
>> 
>> in Mitaka, we introduced a bunch of changes to the way we handle MTU in Neutron/Nova, making sure that the whole instance data path, starting from instance internal interface, thru hybrid bridge, into the br-int; as well as router data path (qr) have proper MTU value set on all participating devices. On hypervisor side, both Nova and Neutron take part in it, setting it with ip-link tool based on what Neutron plugin calculates for us. So far so good.
>> 
>> Turns out that for OVS, it does not work as expected in regards to br-int. There was a bug reported lately: https://launchpad.net/bugs/1590397
>> 
>> Briefly, when we try to set MTU on a device that is plugged into a bridge, and if the bridge already has another port with lower MTU, the bridge itself inherits MTU from that latter port, and Linux kernel (?) does not allow to set MTU on the first device at all, making ip link calls ineffective.
>> 
>> AFAIU this behaviour is consistent with Linux bridging rules: you can’t have ports of different MTU plugged into the same bridge.
>> 
>> Now, that’s a huge problem for Neutron, because we plug ports that belong to different networks (and that hence may have different MTUs) into the same br-int bridge.
>> 
>> So I played with the code locally a bit and spotted that currently, we set MTU for router ports before we move their devices into router namespaces. And once the device is in a namespace, ip-link actually works. So I wrote a fix with a functional test that proves the point: https://review.openstack.org/#/c/327651/ The fix was validated by the reporter of the original bug and seems to fix the issue for him.
>> 
>> It’s suspicious that it works from inside a namespace but not when the device is still in the root namespace. So I reached out to Jiri Benc from our local Open vSwitch team, and here is a quote:
>> 
>> ===
>> 
>> "It's a bug in ovs-vswitchd. It doesn't see the interface that's in
>> other netns and thus cannot enforce the correct MTU.
>> 
>> We'll hopefully fix it and disallow incorrect MTU setting even across
>> namespaces. However, it requires significant effort and rework of ovs
>> name space handling.
>> 
>> You should not depend on the current buggy behavior. Don't set MTU of
>> the internal interfaces higher than the rest of the bridge, it's not
>> supported. Hacking this around by moving the interface to a netns is
>> exploiting of a bug.
>> 
>> We can certainly discuss whether this limitation could be relaxed.
>> Honestly, I don't know, it's for a discussion upstream. But as of now,
>> it's not supported and you should not do it.”
>> 
>> So basically, as long as we try to plug ports with different MTUs into the same bridge, we are utilizing a bug in Open vSwitch, that may break us any time.
>> 
>> I guess our alternatives are:
>> - either redesign bridge setup for openvswitch to e.g. maintain a bridge per network;
>> - or talk to ovs folks on whether they may support that for us.
>> 
>> I understand the former option is too scary. It opens lots of questions, including upgrade impact since it will obviously introduce a dataplane downtime. That would be a huge shift in paradigm, probably too huge to swallow. The latter option may not fly with vswitch folks. Any better ideas?
>> 
>> It’s also not clear whether we want to proceed with my immediate fix. Advices are welcome.
>> 
>> Thanks,
>> Ihar
>> __________________________________________________________________________
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>> 
>> __________________________________________________________________________
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 




More information about the OpenStack-dev mailing list