[openstack-dev] [neutron] VXLAN with single-NIC compute nodes: Avoiding the MTU pitfalls

Fredy Neeser Fredy.Neeser at solnet.ch
Wed Mar 11 11:14:41 UTC 2015

OK, I looked at the devstack patch

    [6] "Configure mtu for ovs with the common protocols"

but no -- it doesn't do the job for the VLAN-based separation
of native and encapsulated traffic, which I'm using in [1] for
a clean (correct MTUs ...) VXLAN setup with single-NIC compute nodes.

As shown in Figure 2 of [1], I'm using VLANs 1 and 12 for native
and encapsulated traffic, respectively.  I needed to manually
create br-ex ports br-ex.1 (VLAN 1) and br-ex.12 (VLAN 12) and
configure their MTUs.  Moreover, I needed a small "VLAN awareness"
patch to ensure that the Neutron router gateway port qg-XXX uses VLAN 1.

Consider the example below:


# ip a
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1554 qdisc pfifo_fast 
master ovs-system state UP group default qlen 1000
     link/ether e0:3f:49:b4:7c:a7 brd ff:ff:ff:ff:ff:ff
6: br-ex: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1554 qdisc noqueue state 
UNKNOWN group default
     link/ether e0:3f:49:b4:7c:a7 brd ff:ff:ff:ff:ff:ff

7: br-ex.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue 
state UNKNOWN group default
     link/ether e0:3f:49:b4:7c:a7 brd ff:ff:ff:ff:ff:ff
     inet brd scope global br-ex.1
        valid_lft forever preferred_lft forever

8: br-ex.12: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1554 qdisc noqueue 
state UNKNOWN group default
     link/ether e0:3f:49:b4:7c:a7 brd ff:ff:ff:ff:ff:ff
     inet brd scope global br-ex.12
        valid_lft forever preferred_lft forever

# ovs-vsctl show
     Bridge br-tun
         Port patch-int
             Interface patch-int
                 type: patch
                 options: {peer=patch-tun}
         Port br-tun
             Interface br-tun
                 type: internal
         Port "vxlan-c0a80115"
             Interface "vxlan-c0a80115"
                 type: vxlan
                 options: {df_default="true", in_key=flow, 
local_ip="", out_key=flow, remote_ip=""}

     Bridge br-ex
         Port phy-br-ex
             Interface phy-br-ex
                 type: patch
                 options: {peer=int-br-ex}
         Port "br-ex.12"
             tag: 12
             Interface "br-ex.12"
                 type: internal
         Port "br-ex.1"
             tag: 1
             Interface "br-ex.1"
                 type: internal
         Port "eth0"
             tag: 1
             trunks: [1, 12]
             Interface "eth0"
         Port "qg-e046ec4e-e3"
             tag: 1
             Interface "qg-e046ec4e-e3"
                 type: internal
         Port br-ex
             Interface br-ex
                 type: internal


My home LAN ("external network") is enabled for Jumbo frames, as can be
seen from eth0's MTU of 1554, so my path MTU for VXLAN is 1554 bytes.

VLAN 12 supports encapsulated traffic with an MTU of 1554 bytes.

This allows my VMs to use the standard MTU of 1500 regardless
of whether they are on different compute nodes (so they communicate
via VXLAN) or on the same compute node, i.e., the effective
L2 segment MTU is 1500 bytes.  Because this is the default,
I don't need to change guest MTUs at all.

For bridge br-ex, I configured two internal ports br-ex.{1,12}
as shown in the table below:

   br-ex         VLAN     MTU    Remarks
   br-ex.1          1    1500
   br-ex.12        12    1554
   br-ex                         Unused
   qg-e046ec4e-e3   1            "VLAN awareness" patch (cf. [1])

All native traffic (including routed traffic to/from a Neutron router
and traffic generated by the Network Node itself) uses VLAN 1 on
my LAN, with an MTU of 1500 bytes.

For my small VXLAN test setup, I didn't need to assign different IPs to
br-ex.1 and br-ex.12, both are assigned

So why doesn't [6] do "the right thing"? --

Well, obviously [6] does not add the "VLAN awareness" that I need
for the Neutron qg-XXX gateway ports.
Moreover, [6] tries to auto-configure the L2 segment MTU based on
determining the MTU of an interface associated with
$TUNNEL_ENDPOINT_IP, which is in my case.

It does this essentially by querying

   # ip -o address | awk "/ {print \$2}"

However, in my case, this would return *two* interfaces:
so the patch [6] wouldn't know which interface's MTU it should take.

I'm currently checking if the "MTU selection and advertisement" patches
[3-5] are compatible with the VLAN-based traffic separation [1].


Fredy Neeser

On 06.03.2015 18:37, Attila Fazekas wrote:

> Can you check is this patch does the right thing [6]:
> [6] https://review.openstack.org/#/c/112523/6
> ----- Original Message -----
>> From: "Fredy Neeser" <Fredy.Neeser at solnet.ch>
>> To: openstack-dev at lists.openstack.org
>> Sent: Friday, March 6, 2015 6:01:08 PM
>> Subject: [openstack-dev] [neutron] VXLAN with single-NIC compute nodes:  Avoiding the MTU pitfalls
>> Hello world
>> I recently created a VXLAN test setup with single-NIC compute nodes
>> (using OpenStack Juno on Fedora 20), conciously ignoring the OpenStack
>> advice of using nodes with at least 2 NICs ;-) .
>> The fact that both native and encapsulated traffic needs to pass through
>> the same NIC does create some interesting challenges, but finally I got
>> it working cleanly, staying clear of MTU pitfalls ...
>> I documented my findings here:
>>     [1]
>> http://blog.systemathic.ch/2015/03/06/openstack-vxlan-with-single-nic-compute-nodes/
>>     [2]
>> http://blog.systemathic.ch/2015/03/05/openstack-mtu-pitfalls-with-tunnels/
>> For those interested in single-NIC setups, I'm curious what you think
>> about [1]  (a small patch is needed to add "VLAN awareness" to the
>> qg-XXX Neutron gateway ports).
>> While catching up with Neutron changes for OpenStack Kilo, I came across
>> the in-progress work on "MTU selection and advertisement":
>>     [3]  Spec:
>> https://github.com/openstack/neutron-specs/blob/master/specs/kilo/mtu-selection-and-advertisement.rst
>>     [4]  Patch review:  https://review.openstack.org/#/c/153733/
>>     [5]  Spec update:  https://review.openstack.org/#/c/159146/
>> Seems like [1] eliminates some additional MTU pitfalls that are not
>> addressed by [3-5].
>> But I think it would be nice if we could achieve [1] while coordinating
>> with the "MTU selection and advertisement" work [3-5].
>> Thoughts?
>> Cheers,
>> - Fredy
>> Fredy ("Freddie") Neeser
>> http://blog.systeMathic.ch
>> __________________________________________________________________________
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

More information about the OpenStack-dev mailing list