[openstack-dev] [TripleO][Neutron] PMTUd broken in gre networks

Robert Collins robertc at robertcollins.net
Tue Jan 21 20:23:39 UTC 2014


Hi,
     https://bugs.launchpad.net/neutron/+bug/1270646 - we've triaged
this as critical in TripleO, because by default GRE overlay networks
will cause huge fragmentation in the datacentre LAN.

We're looking for someone to 'own' this bug (in tripleo) and drive
getting a real, permanent fix / analysis of how to avoid it(e.g. is it
just a specific ovs release? kernel release?)

http://openvswitch.org/pipermail/dev/2013-January/024382.html suggests
that ovs has moved to a simpler packet processing method as part of
making tunnel processing follow the flow fast-path; instead of
inheriting the DF setting from the tunnelled packet, it's either
always on, or always off, and defaults on.

But - in our saucy test build (ovs 1.10.2 + kernel 3.11) we saw
massive performance hits (1000 : 1 slowdown at least) unless we
disabled GRO [which specifically affects fragmentation handling...].
So maybe this isn't working quite right, or there is a transition
period with the kernel datapath or something..

In OpenStack we've got documentation[1] that advises setting a low MTU
for tenants to workaround this issue (but the issue itself is
unsolved) - this is a problem because PMTU is fairly important :)
Lowering *every* tenant when one tenant somewhere hits a new tunnel
with a lower physical packet size limit isn't an answer.

1: http://docs.openstack.org/admin-guide-cloud/content/ch_networking.html#
look for "Create /etc/neutron/dnsmasq-neutron.conf, and add these
values to lower the MTU size on instances and prevent packet
fragmentation over the GRE tunnel:

1
dhcp-option-force=26,1400
"

Note that multiple people have been reporting this basic issue since
approximately H, so it should be easy to reproduce.

Cheers,
Rob

-- 
Robert Collins <rbtcollins at hp.com>
Distinguished Technologist
HP Converged Cloud



More information about the OpenStack-dev mailing list