[openstack-dev] [TripleO][Neutron] PMTUd broken in gre networks

Robert Collins robertc at robertcollins.net
Tue Jan 21 23:00:08 UTC 2014

On 22 January 2014 10:01, Ian Wells <ijw.ubuntu at cack.org.uk> wrote:
> On 21 January 2014 21:23, Robert Collins <robertc at robertcollins.net> wrote:
>> In OpenStack we've got documentation[1] that advises setting a low MTU
>> for tenants to workaround this issue (but the issue itself is
>> unsolved) - this is a problem because PMTU is fairly important :)
>> Lowering *every* tenant when one tenant somewhere hits a new tunnel
>> with a lower physical packet size limit isn't an answer.
> The right answer is probably that (a) GRE drops packets it can't take (it
> used to return a spoofed PMTU exceeded, which was faintly naughty cos it's

In that it's an L2 device which doesn't necessarily have an L3 address
to send a PMTU /from/ which is why it is problematic (and spoofed).

> not a router, and it breaks non-IP protocols; sounds like it fragments now,
> which is probably no better),

I'm about 99% sure it fragments at the moment, GRO on the servers
we're using hides that (at a cost) but if I turn that off I should be
able to see the bad boy in action :).

I think dropping frames that can't be forwarded is entirely sane - at
a guess it's what a physical ethernet switch would do if you try to
send a 1600 byte frame (on a non-jumbo-frame switched network) - but
perhaps there is an actual standard for this we could follow?

>  (b) we use the DHCP option to advertise the
> right MTU, and

Yes; the current manual assignment in neutron is global (which can be
wrong). For instance we have provider networks that can have jumbo
frames on, so there is a big performance hit if we advertise the wrong
MTU to all ports.

> (c) we require Neutron plugins to work out the MTU, which for
> any encap except VLAN is (host interface MTU - header size).

do you mean tunnel wrap overheads? (What if a particular tunnel has a
trailer.. crazy talk I know).

> At this point we probably discover that nothing respects the MTU option in
> DHCP, mind you (I'm not saying it doesn't work; I'm just saying, have you
> ever tried it?)

I haven't :/

> This solution is pedantically correct and I would actually like to see it
> implemented, but there's probably something more pragmatic that can be done.

I think pedantically correct matters here, as low level plumbing needs
to be predictable and reliable.


Robert Collins <rbtcollins at hp.com>
Distinguished Technologist
HP Converged Cloud

More information about the OpenStack-dev mailing list