[openstack-dev] [neutron][ovs] The way we deal with MTU
Armando M.
armamig at gmail.com
Tue Jun 14 11:50:21 UTC 2016
On 13 June 2016 at 22:22, Terry Wilson <twilson at redhat.com> wrote:
> > So basically, as long as we try to plug ports with different MTUs into
> the same bridge, we are utilizing a bug in Open vSwitch, that may break us
> any time.
> >
> > I guess our alternatives are:
> > - either redesign bridge setup for openvswitch to e.g. maintain a bridge
> per network;
> > - or talk to ovs folks on whether they may support that for us.
> >
> > I understand the former option is too scary. It opens lots of questions,
> including upgrade impact since it will obviously introduce a dataplane
> downtime. That would be a huge shift in paradigm, probably too huge to
> swallow. The latter option may not fly with vswitch folks. Any better ideas?
>
> I know I've heard from people who'd like to be able to support both
> DPDK and non-DPDK workloads on the same node. The current
> implementation with a single br-int (and thus datapath) makes that
> impossible to pull of with good performance. So there may be other
> reasons to consider introducing multiple isolated bridges: MTUs,
> datapath_types, etc.
>
Incidentally this is something that Nova is already capable of handling
(ie. wiring VM's in different bridges) thanks to [1], and with some minor
additions as being discussed in the context of [2] vlan-aware-vms, we can
open up the possibility to this deployment model in a not so distant future.
[1] https://blueprints.launchpad.net/nova/+spec/neutron-ovs-bridge-name
[2] http://lists.openstack.org/pipermail/openstack-dev/2016-June/097025.html
> Terry
>
> On Mon, Jun 13, 2016 at 11:49 AM, Ihar Hrachyshka <ihrachys at redhat.com>
> wrote:
> > Hi all,
> >
> > in Mitaka, we introduced a bunch of changes to the way we handle MTU in
> Neutron/Nova, making sure that the whole instance data path, starting from
> instance internal interface, thru hybrid bridge, into the br-int; as well
> as router data path (qr) have proper MTU value set on all participating
> devices. On hypervisor side, both Nova and Neutron take part in it, setting
> it with ip-link tool based on what Neutron plugin calculates for us. So far
> so good.
> >
> > Turns out that for OVS, it does not work as expected in regards to
> br-int. There was a bug reported lately:
> https://launchpad.net/bugs/1590397
> >
> > Briefly, when we try to set MTU on a device that is plugged into a
> bridge, and if the bridge already has another port with lower MTU, the
> bridge itself inherits MTU from that latter port, and Linux kernel (?) does
> not allow to set MTU on the first device at all, making ip link calls
> ineffective.
> >
> > AFAIU this behaviour is consistent with Linux bridging rules: you can’t
> have ports of different MTU plugged into the same bridge.
> >
> > Now, that’s a huge problem for Neutron, because we plug ports that
> belong to different networks (and that hence may have different MTUs) into
> the same br-int bridge.
> >
> > So I played with the code locally a bit and spotted that currently, we
> set MTU for router ports before we move their devices into router
> namespaces. And once the device is in a namespace, ip-link actually works.
> So I wrote a fix with a functional test that proves the point:
> https://review.openstack.org/#/c/327651/ The fix was validated by the
> reporter of the original bug and seems to fix the issue for him.
> >
> > It’s suspicious that it works from inside a namespace but not when the
> device is still in the root namespace. So I reached out to Jiri Benc from
> our local Open vSwitch team, and here is a quote:
> >
> > ===
> >
> > "It's a bug in ovs-vswitchd. It doesn't see the interface that's in
> > other netns and thus cannot enforce the correct MTU.
> >
> > We'll hopefully fix it and disallow incorrect MTU setting even across
> > namespaces. However, it requires significant effort and rework of ovs
> > name space handling.
> >
> > You should not depend on the current buggy behavior. Don't set MTU of
> > the internal interfaces higher than the rest of the bridge, it's not
> > supported. Hacking this around by moving the interface to a netns is
> > exploiting of a bug.
> >
> > We can certainly discuss whether this limitation could be relaxed.
> > Honestly, I don't know, it's for a discussion upstream. But as of now,
> > it's not supported and you should not do it.”
> >
> > So basically, as long as we try to plug ports with different MTUs into
> the same bridge, we are utilizing a bug in Open vSwitch, that may break us
> any time.
> >
> > I guess our alternatives are:
> > - either redesign bridge setup for openvswitch to e.g. maintain a bridge
> per network;
> > - or talk to ovs folks on whether they may support that for us.
> >
> > I understand the former option is too scary. It opens lots of questions,
> including upgrade impact since it will obviously introduce a dataplane
> downtime. That would be a huge shift in paradigm, probably too huge to
> swallow. The latter option may not fly with vswitch folks. Any better ideas?
> >
> > It’s also not clear whether we want to proceed with my immediate fix.
> Advices are welcome.
> >
> > Thanks,
> > Ihar
> >
> __________________________________________________________________________
> > OpenStack Development Mailing List (not for usage questions)
> > Unsubscribe:
> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20160614/7df650fd/attachment.html>
More information about the OpenStack-dev
mailing list