[Openstack] Weird ARP responses in a Neutron managed openvswitch environment

Steffens, Michael Michael.Steffens at vector.com
Tue Jan 12 13:11:17 UTC 2016


Hello,

does anyone have an idea what the following failure could be caused by? In summary: guest VMs connected to a tenant network are receiving bogus ARP responses. These are mapping unused IP addresses to virtual bridge ports belonging to other ports on the same compute host.

We are using Kilo openvswitch-agent with ml2 plugin.

Please have a look at the following example. A VM with the fixed-ip 192.168.1.15 reports the following ARP cache:

   root at michael-test2:~# arp
   Address HWtype HWaddress Flags Mask Iface
   host-192-168-1-2.openst ether fa:16:3e:de:ab:ea C eth0
   192.168.1.13 ether a6:b2:dc:d8:39:c1 C eth0
   192.168.1.119 (incomplete) eth0
   host-192-168-1-20.opens ether fa:16:3e:76:43:ce C eth0
   host-192-168-1-19.opens ether fa:16:3e:0d:a6:0b C eth0
   host-192-168-1-1.openst ether fa:16:3e:2a:81:ff C eth0
   192.168.1.14 ether 0e:bf:04:b7:ed:52 C eth0

Please note that both 192.168.1.13 and 192.168.1.14 are not in use in this subnet. The displayed MAC addresses a6:b2:dc:d8:39:c1 and 0e:bf:04:b7:ed:52 actually belong to other instance qbr* and qvb* devices, living on their respective hypervisor hosts!

Looking at 0e:bf:04:b7:ed:52, for example, yields

   # ip link list | grep -C1 -e 0e:bf:04:b7:ed:52
   59: qbr9ac24ac1-e1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default
       link/ether 0e:bf:04:b7:ed:52 brd ff:ff:ff:ff:ff:ff
   60: qvo9ac24ac1-e1: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master ovs-system state UP mode DEFAULT group default qlen 1000
   --
   61: qvb9ac24ac1-e1: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master qbr9ac24ac1-e1 state UP mode DEFAULT group default qlen 1000
       link/ether 0e:bf:04:b7:ed:52 brd ff:ff:ff:ff:ff:ff
   62: tap9ac24ac1-e1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master qbr9ac24ac1-e1 state UNKNOWN mode DEFAULT group default qlen 500

on the compute node. Using tcpdump on qbr9ac24ac1-e1 on the host and triggering a fresh ARM lookup on the guest VM results in

   # tcpdump -i qbr9ac24ac1-e1 -vv -l | grep ARP
   tcpdump: WARNING: qbr9ac24ac1-e1: no IPv4 address assigned
   tcpdump: listening on qbr9ac24ac1-e1, link-type EN10MB (Ethernet), capture size 65535 bytes
   14:00:32.089726 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 192.168.1.14 tell 192.168.1.15, length 28
   14:00:32.089740 ARP, Ethernet (len 6), IPv4 (len 4), Reply 192.168.1.14 is-at 0e:bf:04:b7:ed:52 (oui Unknown), length 28
   14:00:32.090141 ARP, Ethernet (len 6), IPv4 (len 4), Reply 192.168.1.14 is-at 7a:a5:71:63:47:94 (oui Unknown), length 28
   14:00:32.090160 ARP, Ethernet (len 6), IPv4 (len 4), Reply 192.168.1.14 is-at 02:f9:33:d5:04:0d (oui Unknown), length 28
   14:00:32.090168 ARP, Ethernet (len 6), IPv4 (len 4), Reply 192.168.1.14 is-at 9a:a0:46:e4:03:06 (oui Unknown), length 28

As you can see there are four different devices claiming to own the unused IP address! Looking them up in neutron shows they are all related to existing ports on the subnet, but different ones:

   # neutron port-list | grep -e 47fbb8b5-55 -e 46647cca-32 -e e9e2d7c3-7e -e 9ac24ac1-e1
   | 46647cca-3293-42ea-8ec2-0834e19422fa | | fa:16:3e:7d:9c:45 | {"subnet_id": "25dbbdc0-f438-4f89-8663-1772f9c7ef36", "ip_address": "192.168.1.8"} |
   | 47fbb8b5-5549-46e4-850e-bd382375e0f8 | | fa:16:3e:fa:df:32 | {"subnet_id": "25dbbdc0-f438-4f89-8663-1772f9c7ef36", "ip_address": "192.168.1.7"} |
   | 9ac24ac1-e157-484e-b6a2-a1dded4731ac | | fa:16:3e:2a:80:6b | {"subnet_id": "25dbbdc0-f438-4f89-8663-1772f9c7ef36", "ip_address": "192.168.1.15"} |
   | e9e2d7c3-7e58-4bc2-a25f-d48e658b2d56 | | fa:16:3e:0d:a6:0b | {"subnet_id": "25dbbdc0-f438-4f89-8663-1772f9c7ef36", "ip_address": "192.168.1.19"} |


Impact: Linux guest don't seem to suffer from bogus ARP entries, so the problem may not be noticed in a pure Linux environment. Windows guest do, however. They verify IP addresses offered by DHCP against ARP, and reject IP configuration in case of conflicts. In the example above any Windows VM offered 192.168.1.13 or 192.168.1.14 will fail to configure its network interface. This is actually how we noticed the issue.

Cheers!
Michael
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack/attachments/20160112/bbb59266/attachment.html>


More information about the Openstack mailing list