<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On 7 July 2017 at 12:14, Ihar Hrachyshka <span dir="ltr"><<a href="mailto:ihrachys@redhat.com" target="_blank">ihrachys@redhat.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="HOEnZb"><div class="h5"><span style="color:rgb(34,34,34)">> That said: what will you do with existing VMs that have been told the MTU of</span><br></div></div><span class="">

> their network already?<br>

<br>

</span>Same as we do right now when modifying configuration options defining<br>

underlying MTU: change it on API layer, update data path with the new<br>

value (tap to brq to router/dhcp legs) and hope instances will get<br>

there too (by means of dhcp lease refresh eventually happening, or<br>

rebooting instances, or else). There is no silver bullet here, we have<br>

no way to tell instances to update their interface MTUs.<br></blockquote><div><br></div><div>Indeed, and I think that's my point.</div><div><br></div><div>Let me propose an option 2.</div><div><br></div><div>Refuse to migrate if it would invalidate the MTU property on an existing network.  If this happens, the operator can delete such networks, or clear them out and recreate them with a smaller MTU.  The point being, since the automation can't reliably fix the MTU of the running VMs, the automation shouldn't change the MTU of the network - it's not in the power of the network control code to get the results right - and you should instead tell the operator he has to make some decisions to make about whether VMs have to be restarted, networks deleted or recreated, etc. that can't be judged automatically.</div><div><br></div><div>However, explain in the documentation how to make a migration that won't invalidate your existing virtual networks' MTUs, allowing you to preserve all your networks with the same MTU they already have.  If you migrate encap-A to bigger-encap-B (and you lose some more bytes from the infra MTU) it would refuse to migrate most networks *unless* you simultaneously increased the path_mtu to allow for the extra bytes.  So, B takes 10 extra bytes, you fiddle with your switches to increase their MTU by 10, your auto-migration itself fiddles with the MTUs on host interfaces and vswitches, and the MTU of the virtual network remains the same (because phys MTU - encap >= biggest allowed virtual network MTU before the upgrade).</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">At least not till we get both new ovs and virtio-net in the guests<br>

that will know how to deal with MTU hints:<br>

<a href="https://bugzilla.redhat.com/show_bug.cgi?id=1408701" rel="noreferrer" target="_blank">https://bugzilla.redhat.com/<wbr>show_bug.cgi?id=1408701</a><br>

<a href="https://bugzilla.redhat.com/show_bug.cgi?id=1366919" rel="noreferrer" target="_blank">https://bugzilla.redhat.com/<wbr>show_bug.cgi?id=1366919</a><br>

(there should also be ovs integration piece but I can't find it right away.)<br></blockquote><div><br></div><div>... and every OS on the planet actually uses it, and no-one uses an e1000 NIC or an SRIOV NIC, and and and...</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Though even with that, I don't know if guest will be notified about<br>

changes happening during its execution, or only on boot (that probably<br>

depends on whether virtio polls the mtu storage). And anyway, it<br>

depends on guest kernel, so no luck for windows guests and such.<br>

<span class=""><br>

><br>

> Put a different way, a change to the infrastructure can affect MTUs in two<br>

> ways:<br>

><br>

> - I increase the MTU that a network can pass (by, for instance, increasing<br>

> the infrastructure of the encap).  I don't need to change its MTU because<br>

> VMs that run on it will continue to work.  I have no means to tell the VMs<br>

> they have a bigger MTU now, and whatever method I might use needs to be 100%<br>

> certain to work or left-out VMs will become impossible to talk to, so<br>

> leaving the MTU alone is sane.<br>

<br>

</span>In this scenario, it sounds like you assume everything will work just<br>

fine. But you don't consider neutron routers that will enforce the new<br>

larger MTU for fragmentation, that may end up sending frames to<br>

unaware VMs of size that they can't choke.<br></blockquote><div><br></div><div>Actually, no.  I'm saying here that I increase the *MTU that the network can pass* - for instance, I change the MTU on my physical switch from 1500 to 9000 - but I don't change anything about my OpenStack network properties.  Thus if I were to send a packet of 9000 (and the property on the virtual network still says the MTU is 1500) it gets to its destination, because the API doesn't guarantee that the packets are dropped; it just makes no guarantee that the packet will be passed, so this is undefined behaviour territory.  The virtual network's MTU *property* is still 1500, we can still guarantee that the network will pass packets up to and including 1500 bytes, and the router interfaces, just like VM interfaces, are set from the MTU property to a 1500 MTU - so they emit transmissible packets and they all agree on the MTU size, which is what's necessary for a network to work.  The fact that the fabric will now pass 9000 byte packets isn't relevant.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">> - I decrease the MTU that a network can pass (by, for instance, using an<br>

> encap with larger headers).  The network comprehensively breaks; VMs<br>

> frequently fail to communicate regardless of whether I change the network<br>

> MTU property, because running VMs have already learned their MTU value and,<br>

> again, there's no way to update their idea of what it is reliably.<br>

> Basically, this is not a migration that can be done with running VMs.<br>

<br>

</span>Yeah. You may need to do some multiple step dance, like:<br>

<br>

- before mtu reduction, lower dhcp_lease_duration to 3 mins;<br>

- wait until all leases are refreshed;<br></blockquote><div><br></div><div>... hope and pray that the DHCP agent in the host checks the MTU on every lease renewal - I'm not saying for definite that it doesn't, but I don't think anyone usually designs for the MTU to change after interface-up...</div></div></div></div>