[Openstack-operators] How to deal with MTU issue on routers servicing vxlan tenant networks? (openstack-ansible with LXC containers)

David Young davidy at funkypenguin.co.nz
Tue Dec 5 20:47:06 UTC 2017

Hi all,

I'm running Openstack Ocata (Deployed with openstack-ansible), with the 
following configuration:

* Compute nodesrunning nova and neutron agent
* 2 x Controllersrunning neutronserver/agents in LXC containers (as 
deployed by openstack-ansible playbooks)
* Underlying hosts have a single NIC (MTU 9000) with multiple VLAN 
subinterafces, which in turn are connected to bridges br-vxlan, br-vlan, 

I've encountered the following problem:

1. When I create in instance in a vxlan tenant network, without changing 
any configuration files, the instance (linux default) assumes an MTU of 
1500, but in reality only has an MTU of 1450 (because of the VXLAN 
overhead). Instances cannot ping each other or their gateway (a neutron 
router) with > 1450 MTU.

2. While I _could_ push an MTU of 1450 to my instances via DHCP, this is 
(a) not always reliable depending on the guest OS, and (b) breaks 
dockeron instances, which defaults to an MTU of 1500 for docker0

3. So, I attempted the configuration changes described at 
increasing my global MTU to 1550 in neutron.conf / ml2_conf.ini, on the 
compute nodes, and the neutron client & server LXC containers on the 
controller, so that a default MTU of 1500 in my instances would always work.

4. The effect of step #3 above is that now my instances can communicate 
with _each other_ at up to 1500 MTU, _but_ they still can't ping their 
gateway (the neutron router) at anything over 1450 MTU.

5. When I examine my compute nodes (underlying host OS), I note that the 
bridge "br-vxlan" contains the vlan subinterface (MTU 9000) plus a veth 
interface for connectivity to the neutron-agents LXC container (e.g. 
"04063403_eth10"). The veth interface has an MTU of 1500. The 
corresponding interface within the neutron-agents LXC container (eth10) 
also has an MTU of 1500.

6. Assuming that #5 is the cause of my MTU fault (i.e., a 1500-byte 
packet from the instance over the tentant network = 1500+50=1550, can't 
pass through the veth interface), I manually changed the veth interface 
(and the corresponding interface within the LXC container) to MTU 1550.

7. Now I can pass packets from my instances to the neutron router as 
large as 1468 bytes (previous limit was 1448), but still not the 1500 
bytes I expected.

8. Increasing the MTU again (per #6 above) to 1600 makes no difference 
to the result in #7 above.

So, I'm thinking I've missed something, and the most likely issue is the 
definition of the LXC container (and veth interfaces) for neutron-agents 
on the controller. I thought it was a simple fix (manually change MTU 
per #6), but I'm baffled re why increasing MTU on the veth interfaces by 
50 bytes only got me 20 bytes more overhead (1468), and even if this 
_was_ the fix, it's obviously only temporary, so I wonder what is the 
correct way to address the MTU issue under openstack-ansible?

Can anybody shed some light on this?


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-operators/attachments/20171206/fd4187a1/attachment.html>

More information about the OpenStack-operators mailing list