[openstack-dev] [neutron] writable mtu
ijw.ubuntu at cack.org.uk
Fri Jul 7 22:44:37 UTC 2017
On 7 July 2017 at 12:14, Ihar Hrachyshka <ihrachys at redhat.com> wrote:
> > That said: what will you do with existing VMs that have been told the
> MTU of
> > their network already?
> Same as we do right now when modifying configuration options defining
> underlying MTU: change it on API layer, update data path with the new
> value (tap to brq to router/dhcp legs) and hope instances will get
> there too (by means of dhcp lease refresh eventually happening, or
> rebooting instances, or else). There is no silver bullet here, we have
> no way to tell instances to update their interface MTUs.
Indeed, and I think that's my point.
Let me propose an option 2.
Refuse to migrate if it would invalidate the MTU property on an existing
network. If this happens, the operator can delete such networks, or clear
them out and recreate them with a smaller MTU. The point being, since the
automation can't reliably fix the MTU of the running VMs, the automation
shouldn't change the MTU of the network - it's not in the power of the
network control code to get the results right - and you should instead tell
the operator he has to make some decisions to make about whether VMs have
to be restarted, networks deleted or recreated, etc. that can't be judged
However, explain in the documentation how to make a migration that won't
invalidate your existing virtual networks' MTUs, allowing you to preserve
all your networks with the same MTU they already have. If you migrate
encap-A to bigger-encap-B (and you lose some more bytes from the infra MTU)
it would refuse to migrate most networks *unless* you simultaneously
increased the path_mtu to allow for the extra bytes. So, B takes 10 extra
bytes, you fiddle with your switches to increase their MTU by 10, your
auto-migration itself fiddles with the MTUs on host interfaces and
vswitches, and the MTU of the virtual network remains the same (because
phys MTU - encap >= biggest allowed virtual network MTU before the upgrade).
> At least not till we get both new ovs and virtio-net in the guests
> that will know how to deal with MTU hints:
> (there should also be ovs integration piece but I can't find it right
... and every OS on the planet actually uses it, and no-one uses an e1000
NIC or an SRIOV NIC, and and and...
> Though even with that, I don't know if guest will be notified about
> changes happening during its execution, or only on boot (that probably
> depends on whether virtio polls the mtu storage). And anyway, it
> depends on guest kernel, so no luck for windows guests and such.
> > Put a different way, a change to the infrastructure can affect MTUs in
> > ways:
> > - I increase the MTU that a network can pass (by, for instance,
> > the infrastructure of the encap). I don't need to change its MTU because
> > VMs that run on it will continue to work. I have no means to tell the
> > they have a bigger MTU now, and whatever method I might use needs to be
> > certain to work or left-out VMs will become impossible to talk to, so
> > leaving the MTU alone is sane.
> In this scenario, it sounds like you assume everything will work just
> fine. But you don't consider neutron routers that will enforce the new
> larger MTU for fragmentation, that may end up sending frames to
> unaware VMs of size that they can't choke.
Actually, no. I'm saying here that I increase the *MTU that the network
can pass* - for instance, I change the MTU on my physical switch from 1500
to 9000 - but I don't change anything about my OpenStack network
properties. Thus if I were to send a packet of 9000 (and the property on
the virtual network still says the MTU is 1500) it gets to its destination,
because the API doesn't guarantee that the packets are dropped; it just
makes no guarantee that the packet will be passed, so this is undefined
behaviour territory. The virtual network's MTU *property* is still 1500,
we can still guarantee that the network will pass packets up to and
including 1500 bytes, and the router interfaces, just like VM interfaces,
are set from the MTU property to a 1500 MTU - so they emit transmissible
packets and they all agree on the MTU size, which is what's necessary for a
network to work. The fact that the fabric will now pass 9000 byte packets
> > - I decrease the MTU that a network can pass (by, for instance, using an
> > encap with larger headers). The network comprehensively breaks; VMs
> > frequently fail to communicate regardless of whether I change the network
> > MTU property, because running VMs have already learned their MTU value
> > again, there's no way to update their idea of what it is reliably.
> > Basically, this is not a migration that can be done with running VMs.
> Yeah. You may need to do some multiple step dance, like:
> - before mtu reduction, lower dhcp_lease_duration to 3 mins;
> - wait until all leases are refreshed;
... hope and pray that the DHCP agent in the host checks the MTU on every
lease renewal - I'm not saying for definite that it doesn't, but I don't
think anyone usually designs for the MTU to change after interface-up...
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the OpenStack-dev