[Openstack] A Grizzly arping failure

Greg Chavez greg.chavez at gmail.com
Fri May 10 17:32:55 UTC 2013


More information: Even thought it appears that the dhcp namespace has
the ip/MAC mappings in arp, is still doesn't respond to arps:

# ip netns exec qdhcp-af224f3f-8de6-4e0d-b043-6bcd5cb014c5 arp
Address                  HWtype  HWaddress           Flags Mask            Iface
192.168.252.3            ether   fa:16:3e:8a:02:78   C
    tap14301d16-5d
192.168.252.5            ether   fa:16:3e:ca:64:56   C
    tap14301d16-5d

# ip netns exec qrouter-5ade8a36-4bcf-443a-bf96-89168ec34a13 tcpdump
-ni qr-9ae4cc80-c7 host 192.168.252.5

13:30:36.243354 ARP, Request who-has 192.168.252.5 tell 192.168.252.1, length 28
13:30:36.243649 ARP, Request who-has 192.168.252.5 tell 192.168.252.1, length 28
13:30:37.241445 ARP, Request who-has 192.168.252.5 tell 192.168.252.1, length 28
13:30:37.241475 ARP, Request who-has 192.168.252.5 tell 192.168.252.1, length 28
13:30:38.241260 ARP, Request who-has 192.168.252.5 tell 192.168.252.1, length 28

Very annoying!

On Fri, May 10, 2013 at 12:36 PM, Greg Chavez <greg.chavez at gmail.com> wrote:
> I sent a message about this 3 days ago, but I didn't get a response.
> So if at first you don't succeed...
>
> The symptom is that my VMs only survive on the external network for
> about a minute. The behavior is not consistent, however.  I've
> observed the following different types of failures:
>
> * ssh session to VM is idle for a minute, then the connection dies.
> It can only be re-established by initiating network activity from the
> VM console.
> * ssh session to VM is idle for many minutes, then I start typing and
> after a few seconds of latency, the connection resumes.
> * initiating a new ssh or icmp to a VM fails 95% of the time.  It
> seems that the VMs occasionally request new DHCP leases, at which time
> the VM is briefly available on the external network.
>
> I've troubleshot this to the dnmasq instance in the tenant's network
> namespace.  When a VM is unavailable I can trace traffic all the way
> to the "qrouter".
>
> Tenant IP: 192.168.252.6/23
> Floating IP: 10.21.166.5/22
> SSHing IP: 10.21.164.10/22
>
> # ip netns exec qrouter-5ade8a36-4bcf-443a-bf96-89168ec34a13 tcpdump
> -ni qg-498f9986-74 host 10.21.166.5
>
> 12:29:00.279454 IP 10.21.164.10 > 10.21.166.5: ICMP echo request, id
> 17219, seq 1, length 64
> 12:29:01.278800 IP 10.21.164.10 > 10.21.166.5: ICMP echo request, id
> 17219, seq 2, length 64
> 12:29:02.278793 IP 10.21.164.10 > 10.21.166.5: ICMP echo request, id
> 17219, seq 3, length 64
> 12:29:03.277277 IP 10.21.166.5 > 10.21.164.10: ICMP host 10.21.166.5
> unreachable, length 92
> 12:29:03.277305 IP 10.21.166.5 > 10.21.164.10: ICMP host 10.21.166.5
> unreachable, length 92
> 12:29:03.277315 IP 10.21.166.5 > 10.21.164.10: ICMP host 10.21.166.5
> unreachable, length 92
> 12:29:03.277758 IP 10.21.164.10 > 10.21.166.5: ICMP echo request, id
> 17219, seq 4, length 64
> 12:29:04.277755 IP 10.21.164.10 > 10.21.166.5: ICMP echo request, id
> 17219, seq 5, length 64
> 12:29:05.278769 ARP, Request who-has 10.21.166.5 tell 10.21.164.10, length 46
> 12:29:05.278781 ARP, Reply 10.21.166.5 is-at fa:16:3e:4d:e6:93, length 28
> 12:29:06.277277 IP 10.21.166.5 > 10.21.164.10: ICMP host 10.21.166.5
> unreachable, length 92
> 12:29:06.277301 IP 10.21.166.5 > 10.21.164.10: ICMP host 10.21.166.5
> unreachable, length 92
>
> The arp reply points to the MAC for interface qg-498f9986-74.  So
> that's good.  But on the other side of the router, arp fails:
>
> # ip netns exec qrouter-5ade8a36-4bcf-443a-bf96-89168ec34a13 tcpdump
> -ni qr-9ae4cc80-c7 host 192.168.252.6
>
> 12:31:34.798835 ARP, Request who-has 192.168.252.6 tell 192.168.252.1, length 28
> 12:31:34.799136 ARP, Request who-has 192.168.252.6 tell 192.168.252.1, length 28
> 12:31:35.797250 ARP, Request who-has 192.168.252.6 tell 192.168.252.1, length 28
> 12:31:35.797310 ARP, Request who-has 192.168.252.6 tell 192.168.252.1, length 28
> 12:31:36.797262 ARP, Request who-has 192.168.252.6 tell 192.168.252.1, length 28
> 12:31:36.797296 ARP, Request who-has 192.168.252.6 tell 192.168.252.1, length 28
> 12:31:37.797858 ARP, Request who-has 192.168.252.6 tell 192.168.252.1, length 28
>
> No reply.  Nothing.  But look what happens when I ping from my VM console:
>
> # ip netns exec qrouter-5ade8a36-4bcf-443a-bf96-89168ec34a13 tcpdump
> -ni qr-9ae4cc80-c7 host 192.168.252.6
>
> 12:33:44.190214 ARP, Request who-has 192.168.252.1 tell 192.168.252.6, length 28
> 12:33:44.190245 ARP, Reply 192.168.252.1 is-at fa:16:3e:f8:a4:54, length 28
> 12:33:44.190981 IP 192.168.252.6 > 10.21.164.1: ICMP echo request, id
> 46648, seq 1, length 64
> 12:33:44.192025 IP 10.21.164.1 > 192.168.252.6: ICMP echo reply, id
> 46648, seq 1, length 64
> 12:33:45.188743 IP 192.168.252.6 > 10.21.164.1: ICMP echo request, id
> 46648, seq 2, length 64
> 12:33:45.189242 IP 10.21.164.1 > 192.168.252.6: ICMP echo reply, id
> 46648, seq 2, length 64
>
> Works.  Does anybody have any idea what's going on here?  My next move
> is to look into how dnmsasq is configured.  I assume that's the source
> of the problem, but I feel a little in over my head.  Any suggestions
> would be great.  Thanks in advance!
>
> --
> \*..+.-
> --Greg Chavez
> +//..;};



-- 
\*..+.-
--Greg Chavez
+//..;};




More information about the Openstack mailing list