[openstack-dev] [neutron] VXLAN with single-NIC compute nodes: Avoiding the MTU pitfalls
Fredy Neeser
Fredy.Neeser at solnet.ch
Wed Mar 11 11:27:26 UTC 2015
[resent with a clarification of what [6] is doing towards EoM]
OK, I looked at the devstack patch
[6] "Configure mtu for ovs with the common protocols"
but no -- it doesn't do the job for the VLAN-based separation
of native and encapsulated traffic, which I'm using in [1] for
a clean (correct MTUs ...) VXLAN setup with single-NIC compute nodes.
As shown in Figure 2 of [1], I'm using VLANs 1 and 12 for native
and encapsulated traffic, respectively. I needed to manually
create br-ex ports br-ex.1 (VLAN 1) and br-ex.12 (VLAN 12) and
configure their MTUs. Moreover, I needed a small "VLAN awareness"
patch to ensure that the Neutron router gateway port qg-XXX uses VLAN 1.
Consider the example below:
<Example>
# ip a
...
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1554 qdisc pfifo_fast
master ovs-system state UP group default qlen 1000
link/ether e0:3f:49:b4:7c:a7 brd ff:ff:ff:ff:ff:ff
...
6: br-ex: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1554 qdisc noqueue state
UNKNOWN group default
link/ether e0:3f:49:b4:7c:a7 brd ff:ff:ff:ff:ff:ff
7: br-ex.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue
state UNKNOWN group default
link/ether e0:3f:49:b4:7c:a7 brd ff:ff:ff:ff:ff:ff
inet 192.168.1.14/24 brd 192.168.1.255 scope global br-ex.1
valid_lft forever preferred_lft forever
8: br-ex.12: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1554 qdisc noqueue
state UNKNOWN group default
link/ether e0:3f:49:b4:7c:a7 brd ff:ff:ff:ff:ff:ff
inet 192.168.1.14/24 brd 192.168.1.255 scope global br-ex.12
valid_lft forever preferred_lft forever
# ovs-vsctl show
c0618b20-1eeb-486c-88bd-fb96988dbf96
Bridge br-tun
Port patch-int
Interface patch-int
type: patch
options: {peer=patch-tun}
Port br-tun
Interface br-tun
type: internal
Port "vxlan-c0a80115"
Interface "vxlan-c0a80115"
type: vxlan
options: {df_default="true", in_key=flow,
local_ip="192.168.1.14", out_key=flow, remote_ip="192.168.1.21"}
Bridge br-ex
Port phy-br-ex
Interface phy-br-ex
type: patch
options: {peer=int-br-ex}
Port "br-ex.12"
tag: 12
Interface "br-ex.12"
type: internal
Port "br-ex.1"
tag: 1
Interface "br-ex.1"
type: internal
Port "eth0"
tag: 1
trunks: [1, 12]
Interface "eth0"
Port "qg-e046ec4e-e3"
tag: 1
Interface "qg-e046ec4e-e3"
type: internal
Port br-ex
Interface br-ex
type: internal
</Example>
My home LAN ("external network") is enabled for Jumbo frames, as can be
seen from eth0's MTU of 1554, so my path MTU for VXLAN is 1554 bytes.
VLAN 12 supports encapsulated traffic with an MTU of 1554 bytes.
This allows my VMs to use the standard MTU of 1500 regardless
of whether they are on different compute nodes (so they communicate
via VXLAN) or on the same compute node, i.e., the effective
L2 segment MTU is 1500 bytes. Because this is the default,
I don't need to change guest MTUs at all.
For bridge br-ex, I configured two internal ports br-ex.{1,12}
as shown in the table below:
br-ex VLAN MTU Remarks
Port
--------------------------------------------------------------
br-ex.1 1 1500
br-ex.12 12 1554
br-ex Unused
qg-e046ec4e-e3 1 "VLAN awareness" patch (cf. [1])
All native traffic (including routed traffic to/from a Neutron router
and traffic generated by the Network Node itself) uses VLAN 1 on
my LAN, with an MTU of 1500 bytes.
For my small VXLAN test setup, I didn't need to assign different IPs to
br-ex.1 and br-ex.12, both are assigned 192.168.1.14/24.
So why doesn't [6] do "the right thing"? --
Well, obviously [6] does not add the "VLAN awareness" that I need
for the Neutron qg-XXX gateway ports.
Moreover, [6] tries to auto-configure the L2 segment MTU based on
guessing the path MTU by determining the MTU of an interface associated
with $TUNNEL_ENDPOINT_IP, which is 192.168.1.14 in my case.
It does this essentially by querying
# ip -o address | awk "/192.168.1.14/ {print \$2}"
getting the MTU of that interface and then subtracting out the overhead
for VXLAN encapsulation.
However, in my case, the above lookup would return *two* interfaces:
br-ex.1
br-ex.12
so the patch [6] wouldn't know which interface's MTU it should take.
Also, when I'm doing "VLAN-based traffic separation" for an overlay
setup using single-NIC nodes, then I already know both the
"L3 path MTU" and the desired "L2 segment MTU".
I'm currently checking if the "MTU selection and advertisement" patches
[3-5] are compatible with the VLAN-based traffic separation [1].
Regards
Fredy Neeser
http://blog.systeMathic.ch
On 06.03.2015 18:37, Attila Fazekas wrote:
> Can you check is this patch does the right thing [6]:
>
> [6] https://review.openstack.org/#/c/112523/6
>
> ----- Original Message -----
>> From: "Fredy Neeser" <Fredy.Neeser at solnet.ch>
>> To: openstack-dev at lists.openstack.org
>> Sent: Friday, March 6, 2015 6:01:08 PM
>> Subject: [openstack-dev] [neutron] VXLAN with single-NIC compute nodes: Avoiding the MTU pitfalls
>>
>> Hello world
>>
>> I recently created a VXLAN test setup with single-NIC compute nodes
>> (using OpenStack Juno on Fedora 20), conciously ignoring the OpenStack
>> advice of using nodes with at least 2 NICs ;-) .
>>
>> The fact that both native and encapsulated traffic needs to pass through
>> the same NIC does create some interesting challenges, but finally I got
>> it working cleanly, staying clear of MTU pitfalls ...
>>
>> I documented my findings here:
>>
>> [1]
>> http://blog.systemathic.ch/2015/03/06/openstack-vxlan-with-single-nic-compute-nodes/
>> [2]
>> http://blog.systemathic.ch/2015/03/05/openstack-mtu-pitfalls-with-tunnels/
>>
>> For those interested in single-NIC setups, I'm curious what you think
>> about [1] (a small patch is needed to add "VLAN awareness" to the
>> qg-XXX Neutron gateway ports).
>>
>>
>> While catching up with Neutron changes for OpenStack Kilo, I came across
>> the in-progress work on "MTU selection and advertisement":
>>
>> [3] Spec:
>> https://github.com/openstack/neutron-specs/blob/master/specs/kilo/mtu-selection-and-advertisement.rst
>> [4] Patch review: https://review.openstack.org/#/c/153733/
>> [5] Spec update: https://review.openstack.org/#/c/159146/
>>
>> Seems like [1] eliminates some additional MTU pitfalls that are not
>> addressed by [3-5].
>>
>> But I think it would be nice if we could achieve [1] while coordinating
>> with the "MTU selection and advertisement" work [3-5].
>>
>> Thoughts?
>>
>> Cheers,
>> - Fredy
>>
>> Fredy ("Freddie") Neeser
>> http://blog.systeMathic.ch
>>
>>
>> __________________________________________________________________________
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
More information about the OpenStack-dev
mailing list