[openstack-dev] [neutron] writable mtu

Ihar Hrachyshka ihrachys at redhat.com
Fri Jul 7 19:14:40 UTC 2017

On Wed, Jul 5, 2017 at 6:11 PM, Ian Wells <ijw.ubuntu at cack.org.uk> wrote:
> On 5 July 2017 at 14:14, Ihar Hrachyshka <ihrachys at redhat.com> wrote:
>> Heya,
>> we have https://bugs.launchpad.net/neutron/+bug/1671634 approved for
>> Pike that allows setting MTU for network on creation.
> This was actually in the very first MTU spec (in case no one looked), though
> it never got implemented.  The spec details a whole bunch of stuff about how
> to calculate whether the proposed MTU will fit within the encap,
> incidentally, and will reject network creations when it doesn't.
> Note that the MTU attribute was intended to represent an MTU that will
> definitely transit.  I guess no-one would actually rely on this, but to
> clarify, it's not intended to indicate that bigger packets will be dropped,
> only that smaller packets will not be dropped (which is the guarantee you
> need for two VMs to talk to each other.  Thus the MTU doesn't need to be
> increased just because the infrastructure MTU has become larger; it just
> means that future networks can be created with larger MTUs from this point,
> and the current MTU will still be valid.
> This is also the MTU that all VMs on that network will be told, because they
> need to use the same value to function.  If you change it, VMs after the
> event will have problems talking to their earlier friends because they will
> now disagree on MTU (and routers will have problems talking to at least one
> of those sets).
>> (but not update,
>> as per latest comment from Kevin there) I already see a use case to
>> modify MTU for an existing network (for example, where you enable
>> Jumbo frames for underlying infrastructure, and want to raise the
>> ceiling; another special case is when you migrate between different
>> encapsulation technologies, like in case of ml2/ovs to networking-ovn
>> migration where the latter doesn't support VXLAN but Geneve only).
> You look like you're changing the read-only segmentation type of the network
> on this migration - presumably in the DB directly - so you're changing
> non-writeable fields already.  Couldn't the MTU be changed in a similarly
> offline manner?

Yeah, you are correct, but we may also hack it around in
networking-ovn by implying all tunneled networks are actually geneve
despite the type in database. (I understand that's rather hackish, but
the very idea of migrating to a driver that doesn't natively support
your tunnel type is hackish af).

Nevertheless, the case where operators want to increase MTU for
existing networks after infrastructure MTU upgrade still stands.

> That said: what will you do with existing VMs that have been told the MTU of
> their network already?

Same as we do right now when modifying configuration options defining
underlying MTU: change it on API layer, update data path with the new
value (tap to brq to router/dhcp legs) and hope instances will get
there too (by means of dhcp lease refresh eventually happening, or
rebooting instances, or else). There is no silver bullet here, we have
no way to tell instances to update their interface MTUs.

At least not till we get both new ovs and virtio-net in the guests
that will know how to deal with MTU hints:
(there should also be ovs integration piece but I can't find it right away.)

Though even with that, I don't know if guest will be notified about
changes happening during its execution, or only on boot (that probably
depends on whether virtio polls the mtu storage). And anyway, it
depends on guest kernel, so no luck for windows guests and such.

> Put a different way, a change to the infrastructure can affect MTUs in two
> ways:
> - I increase the MTU that a network can pass (by, for instance, increasing
> the infrastructure of the encap).  I don't need to change its MTU because
> VMs that run on it will continue to work.  I have no means to tell the VMs
> they have a bigger MTU now, and whatever method I might use needs to be 100%
> certain to work or left-out VMs will become impossible to talk to, so
> leaving the MTU alone is sane.

In this scenario, it sounds like you assume everything will work just
fine. But you don't consider neutron routers that will enforce the new
larger MTU for fragmentation, that may end up sending frames to
unaware VMs of size that they can't choke.

> - I decrease the MTU that a network can pass (by, for instance, using an
> encap with larger headers).  The network comprehensively breaks; VMs
> frequently fail to communicate regardless of whether I change the network
> MTU property, because running VMs have already learned their MTU value and,
> again, there's no way to update their idea of what it is reliably.
> Basically, this is not a migration that can be done with running VMs.

Yeah. You may need to do some multiple step dance, like:

- before mtu reduction, lower dhcp_lease_duration to 3 mins;
- wait until all leases are refreshed;
- lower MTU on a network;
- wait 3 minutes until all instances refresh leases and update their MTUs;
- restore the original value of dhcp_lease_duration.

>> If I go and implement the RFE as-is, and later in Queens we pursue
>> updating MTU for existing networks, we will have three extensions for
>> the same thing.
>> - net-mtu (existing read only attribute)
>> - net-mtu-enhanced (allow write on create)
>> - net-mtu-enhanced-enhanced (allow updates)
>> Not to mention potential addition of per-port MTU that some folks keep
>> asking for (and we keep pushing against so far).
>> So, I wonder if we can instead lay the ground for updatable MTU right
>> away, and allow_post: True from the start, even while implementing
>> create only as a phase-1. Then we can revisit the decision if needed
>> without touching api. What do you think?
> It's trivially detectable that an MTU value can't be set at all, or can be
> set initially but not changed.  Could we use that approach?  That way, we
> don't need multiple extensions, the current one is sufficient (and - on the
> assumption that you don't rely on 'read-only attribute' errors in normal
> code, I think we can call this backward compatible).

You mean we just set allow_post: True, allow_put: True to existing
extension? That's fine, but we need some way to detect whether
updating/setting MTU will work that does not involve catching an error
on api user side. We can probably mess with the attribute map of the
existing extension, but we will still need separate 'flag' extensions
to detect the change gracefully.

>> Another related question is, how do we expose both old and new
>> extensions at the same time? I would imagine that implementations
>> capable of writing to the mtu attribute would advertise both old and
>> new extensions. Is it correct? Does neutron api layer allow for
>> overlapping attribute maps?
> Extension net-mtu: MTU attr exists, can't set MTU at all, passing an MTU
> returns a bad argument error
> Extension net-mtu: MTU attr exists, can set MTU on startup, failed (too big)
> MTU values return a more specific MTU too big error
> Extension net-mtu: MTU attr exists, can set after creation, setting MTU
> after creation fails as for startup write (which it appears you already have
> in mind)
> --
> Ian.
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

More information about the OpenStack-dev mailing list