[openstack-dev] [neutron] VXLAN with single-NIC compute nodes: Avoiding the MTU pitfalls
Fredy Neeser
Fredy.Neeser at solnet.ch
Thu Mar 12 12:33:18 UTC 2015
On 11.03.2015 19:31, Ian Wells wrote:
> On 11 March 2015 at 04:27, Fredy Neeser <Fredy.Neeser at solnet.ch
> <mailto:Fredy.Neeser at solnet.ch>> wrote:
>
> 7: br-ex.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc
> noqueue state UNKNOWN group default
> link/ether e0:3f:49:b4:7c:a7 brd ff:ff:ff:ff:ff:ff
> inet 192.168.1.14/24 <http://192.168.1.14/24> brd
> 192.168.1.255 scope global br-ex.1
> valid_lft forever preferred_lft forever
>
> 8: br-ex.12: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1554 qdisc
> noqueue state UNKNOWN group default
> link/ether e0:3f:49:b4:7c:a7 brd ff:ff:ff:ff:ff:ff
> inet 192.168.1.14/24 <http://192.168.1.14/24> brd
> 192.168.1.255 scope global br-ex.12
> valid_lft forever preferred_lft forever
>
>
> I find it hard to believe that you want the same address configured on
> *both* of these interfaces - which one do you think will be sending
> packets?
Ian, thanks for your feedback!
I did choose the same address for the two interfaces, for three reasons:
1. Within my home single-LAN (underlay) environment, traffic is
switched, and VXLAN traffic is confined to VLAN 12, so there is never a
conflict between IP 192.168.1.14 on VLAN 1 and the same IP on VLAN 12.
OTOH, for a more scalable VXLAN setup (with multiple underlays and L3
routing in between), I would like to use different IPs for br-ex.1 and
br-ex.12 -- for example by using separate subnets
192.168.1.0/26 for VLAN 1
192.168.12.0/26 for VLAN 12
However, I'm not quite there yet (see 3.).
2. I'm using policy routing on my hosts to steer VXLAN traffic (UDP
dest. port 4789) to interface br-ex.12 -- all other traffic from
192.168.1.14 is source routed from br-ex.1, presumably because br-ex.1
is a lower-numbered interface than br-ex.12 (?) -- interesting question
whether I'm relying here on the order in which I created these two
interfaces.
[root at langrain ~]# ip a
...
7: br-ex.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue
state UNKNOWN group default
link/ether e0:3f:49:b4:7c:a7 brd ff:ff:ff:ff:ff:ff
inet 192.168.1.14/24 brd 192.168.1.255 scope global br-ex.1
valid_lft forever preferred_lft forever
8: br-ex.12: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1554 qdisc noqueue
state UNKNOWN group default
link/ether e0:3f:49:b4:7c:a7 brd ff:ff:ff:ff:ff:ff
inet 192.168.1.14/24 brd 192.168.1.255 scope global br-ex.12
valid_lft forever preferred_lft forever
3. It's not clear to me how to setup multiple nodes with packstack if a
node's tunnel IP does not equal its admin IP (or the OpenStack API IP in
case of a controller node). With packstack, I can only specify the
compute node IPs through CONFIG_COMPUTE_HOSTS. Presumably, these IPs are
used for both packstack deployment (admin IP) and for configuring the
VXLAN tunnel IPs (local_ip and remote_ip parameters). How would I
specify different IPs for these purposes? (Recall that my hosts have a
single NIC).
In any case, native traffic on bridge br-ex is sent via br-ex.1 (VLAN
1), which is also the reason the Neutron gateway port qg-XXX needs to be
an access port for VLAN 1 (tag: 1). VXLAN traffic is sent from
br-ex.12 on all compute nodes. See the 2 cases below:
Case 1. Max-size ping from compute node 'langrain' (192.168.1.14) to
another host on same LAN
=> Native traffic sent from br-ex.1; no traffic sent from
br-ex.12
[fn at langrain ~]$ ping -M do -s 1472 -c 1 192.168.1.54
PING 192.168.1.54 (192.168.1.54) 1472(1500) bytes of data.
1480 bytes from 192.168.1.54: icmp_seq=1 ttl=64 time=0.766 ms
[root at langrain ~]# tcpdump -n -i br-ex.1 dst 192.168.1.54
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on br-ex.1, link-type EN10MB (Ethernet), capture size 65535 bytes
10:32:37.666572 IP 192.168.1.14 > 192.168.1.54: ICMP echo request, id
10432, seq 1, length 1480
10:32:42.673665 ARP, Request who-has 192.168.1.54 tell 192.168.1.14,
length 28
Case 2: Max-size ping from a guest1 (10.0.0.1) on compute node
'langrain' (192.168.1.14)
to a guest2 (10.0.0.3) on another compute node
(192.168.1.21) via VXLAN tunnel.
Guests are on the same virtual network 10.0.0.0/24
=> Encapsulated traffic sent from br-ex.12; no traffic
sent from br-ex.1
$ ping -M do -s 1472 -c 1 10.0.0.3
PING 10.0.0.3 (10.0.0.3) 1472(1500) bytes of data.
1480 bytes from 10.0.0.3: icmp_seq=1 ttl=64 time=2.22 ms
[root at langrain ~]# tcpdump -n -i br-ex.12
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on br-ex.12, link-type EN10MB (Ethernet), capture size 65535 bytes
11:02:56.916265 IP 192.168.1.14.47872 > 192.168.1.21.4789: VXLAN, flags
[I] (0x08), vni 10
ARP, Request who-has 10.0.0.3 tell 10.0.0.1, length 28
11:02:56.916991 IP 192.168.1.21.51408 > 192.168.1.14.4789: VXLAN, flags
[I] (0x08), vni 10
ARP, Reply 10.0.0.3 is-at fa:16:3e:e6:e1:c8, length 28
11:02:56.917282 IP 192.168.1.14.57836 > 192.168.1.21.4789: VXLAN, flags
[I] (0x08), vni 10
IP 10.0.0.1 > 10.0.0.3: ICMP echo request, id 25474, seq 1, length 1480
11:02:56.918110 IP 192.168.1.21.44153 > 192.168.1.14.4789: VXLAN, flags
[I] (0x08), vni 10
IP 10.0.0.3 > 10.0.0.1: ICMP echo reply, id 25474, seq 1, length 1480
11:03:01.918885 IP 192.168.1.21.51408 > 192.168.1.14.4789: VXLAN, flags
[I] (0x08), vni 10
ARP, Request who-has 10.0.0.1 tell 10.0.0.3, length 28
11:03:01.919207 IP 192.168.1.14.57760 > 192.168.1.21.4789: VXLAN, flags
[I] (0x08), vni 10
ARP, Reply 10.0.0.1 is-at fa:16:3e:f4:1d:89, length 28
11:03:01.920502 ARP, Request who-has 192.168.1.14 tell 192.168.1.21,
length 46
11:03:01.920519 ARP, Reply 192.168.1.14 is-at e0:3f:49:b4:7c:a7, length 28
> You may find that configuring a VLAN interface for eth1.12 (not in a
> bridge, with a local address suitable for communication with compute
> nodes, for VXLAN traffic) and eth1.1 (in br-ex, for external traffic
> to use) does better for you.
Hmm, I only have one NIC (eth0). In order to attach eth0 to br-ex, I
had to configure it as an OVSPort.
Maybe I misunderstand your alternative, but are you suggesting to
configure eth0.1 as an OVSPort (connected to br-ex), and eth0.12 as a
standalone interface? (Not sure a physical interface can be "brain
split" in such a way.)
> I'm also not clear what your Openstack API endpoint address or MTU is
> - maybe that's why the eth1.1 interface is addressed?
It's 192.168.1.14, and br-ex.1 is always used for native traffic, so the
MTU is 1500.
Note that my physical switch uses a native VLAN of 1 and is configured
with "Untag all ports" for VLAN 1. Moreover, OVSPort eth0 (attached to
br-ex) is configured for VLAN trunking with a native VLAN of 1
(vlan_mode: native-untagged, trunks: [1,12], tag: 1), so within bridge
br-ex, native packets are tagged 1.
> I can tell you that if you want your API to be on the same address
> 192.168.1.14 as the VXLAN tunnel endpoints then it has to be one
> address on one interface and the two functions will share the same MTU
> - almost certainly not what you're looking for.
With my current setup (thanks to policy routing), I have the same IP on
two interfaces br-ex.1 and br-ex.12, with MTUs 1500 and 1554, respectively.
> If you source VXLAN packets from a different IP address then you can
> put it on a different interface and give it a different MTU - which
> appears to fit what you want much better.
Selecting different compute host IPs for admin (CONFIG_COMPUTE_HOSTS)
and tunnel IPs would eliminate the need for policy routing and is also
more suitable for scaling a VXLAN deployment across multiple independent
L2 BC domains, but for that I'll need to resolve point 3. above --
pointers in that direction are much appreciated.
Thanks,
- Fredy
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20150312/12f49a51/attachment.html>
More information about the OpenStack-dev
mailing list