<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=ISO-8859-1">
</head>
<body bgcolor="#FFFFFF" text="#000000">
Hi All,<br>
<br>
I have deployed OpenStack Multi node Havana for provisioning VMs
using ESXi as hypervisor.<br>
However, OpenStack instances looses network connectivity every so
often.<br>
<br>
<font color="#000000"><span style="white-space:pre-wrap">We could
see the ARP reply on the GRE tunnel on the network node, but we
don't see it in the tcpdump <br>
on the 'qr-xxx' interface of the qrouter namespace.For some
strange reason, br-tun does not <br>
pass ARP reply to qr-xxx on network node.<br>
<br>
Please find more details below.<br>
<br>
ESXi version : 5.5.0<br>
OVS version : 1.10.2<br>
<br>
compute1 : 172.16.39.156<br>
compute2 : 172.16.39.155<br>
neutron node : 172.16.39.200<br>
<br>
Please find below packet flow when we ping VM from external
world and VM is not reachable.<br>
<br>
10.230.39.163 is floating and 10.10.10.2 is private IP of VM. <br>
<br>
C:\> ping 10.230.39.163 -t<br>
<br>
Pinging 10.230.39.163 with 32 bytes of data:<br>
<br>
Reply from 10.230.39.163: Destination host unreachable.<br>
Reply from 10.230.39.163: Destination host unreachable.<br>
<br>
root@compute1:~#tcpdump -n -i eth0 proto gre<br>
17:07:57.541440 IP 172.16.39.155 > 172.16.39.156: GREv0,
key=0x1, length 54: ARP, Request who-has 10.10.10.2 tell
10.10.10.1, length 28<br>
17:07:57.541457 IP 172.16.39.156 > 172.16.39.200: GREv0,
key=0x1, length 54: ARP, Request who-has 10.10.10.2 tell
10.10.10.1, length 28<br>
17:07:57.541465 IP 172.16.39.156 > 172.16.39.200: GREv0,
key=0x1, length 72: ARP, Reply 10.10.10.2 is-at
fa:16:3e:ba:22:ff, length 46<br>
17:07:57.541486 IP 172.16.39.156 > 172.16.39.200: GREv0,
key=0x1, length 72: ARP, Reply 10.10.10.2 is-at
fa:16:3e:ba:22:ff, length 46<br>
<br>
Here we can see compute node is sending ARP reply to network
node over gre tunnel.<br>
<br>
root@neutron:~#tcpdump -n -i eth0 proto gre<br>
17:08:55.281644 IP 172.16.39.156 > 172.16.39.200: GREv0,
key=0x1, length 54: ARP, Request who-has 10.10.10.2 tell
10.10.10.1, length 28<br>
17:08:55.281663 IP 172.16.39.155 > 172.16.39.200: GREv0,
key=0x1, length 54: ARP, Request who-has 10.10.10.2 tell
10.10.10.1, length 28<br>
17:08:55.281669 IP 172.16.39.156 > 172.16.39.200: GREv0,
key=0x1, length 72: ARP, Reply 10.10.10.2 is-at
fa:16:3e:ba:22:ff, length 46<br>
<br>
Here we can see network node is receiving ARP reply from compute
node over gre tunnel.<br>
<br>
root@neutron:~# tcpdump -i br-int <br>
tcpdump: WARNING: br-int: no IPv4 address assigned<br>
tcpdump: verbose output suppressed, use -v or -vv for full
protocol decode<br>
listening on br-int, link-type EN10MB (Ethernet), capture size
65535 bytes<br>
17:23:45.344970 ARP, Request who-has 10.10.10.2 tell 10.10.10.1,
length 28<br>
17:23:45.345121 ARP, Request who-has 10.10.10.2 tell 10.10.10.1,
length 28<br>
<br>
We can see only ARP request on br-int of network node. There is
no ARP reply on br-int.<br>
<br>
root@neutron:~# ip netns exec
qrouter-905087ce-b1e2-4038-beae-32865fa7924b tcpdump -i
qr-8d3a43d5-f5<br>
tcpdump: verbose output suppressed, use -v or -vv for full
protocol decode<br>
listening on qr-8d3a43d5-f5, link-type EN10MB (Ethernet),
capture size 65535 bytes<br>
^C23:21:20.950674 ARP, Request who-has 10.10.10.6 tell
headnode.hpc.hpc, length 28<br>
23:21:20.951224 ARP, Request who-has 10.10.10.6 tell
headnode.hpc.hpc, length 28<br>
23:21:20.951237 ARP, Request who-has 10.10.10.6 tell
headnode.hpc.hpc, length 28<br>
23:21:21.949476 ARP, Request who-has 10.10.10.6 tell
headnode.hpc.hpc, length 28<br>
23:21:21.949622 ARP, Request who-has 10.10.10.6 tell
headnode.hpc.hpc, length 28<br>
<br>
Here we could not see any ARP reply on qr-xxx.<br>
<br>
root@neutron:~# ovs-vsctl show<br>
1f793d03-8ab1-495c-877e-e6002dda9912<br>
Bridge br-ex<br>
Port br-ex<br>
Interface br-ex<br>
type: internal<br>
Port "eth2"<br>
Interface "eth2"<br>
Port "qg-ac77add8-f6"<br>
Interface "qg-ac77add8-f6"<br>
type: internal<br>
Bridge br-tun<br>
Port "gre-1"<br>
Interface "gre-1"<br>
type: gre<br>
options: {in_key=flow, local_ip="172.16.39.200", out_key=flow,
remote_ip="172.16.39.155"}<br>
Port br-tun<br>
Interface br-tun<br>
type: internal<br>
Port "gre-2"<br>
Interface "gre-2"<br>
type: gre<br>
options: {in_key=flow, local_ip="172.16.39.200", out_key=flow,
remote_ip="172.16.39.156"}<br>
Port patch-int<br>
Interface patch-int<br>
type: patch<br>
options: {peer=patch-tun}<br>
Bridge br-int<br>
Port "tapfc47451e-6e"<br>
tag: 2<br>
Interface "tapfc47451e-6e"<br>
type: internal<br>
Port patch-tun<br>
Interface patch-tun<br>
type: patch<br>
options: {peer=patch-int}<br>
Port br-int<br>
Interface br-int<br>
type: internal<br>
Port "qr-9362c080-49"<br>
tag: 1<br>
Interface "qr-9362c080-49"<br>
type: internal<br>
Port "tap494d9d45-de"<br>
tag: 1<br>
Interface "tap494d9d45-de"<br>
type: internal<br>
ovs_version: "1.10.2"<br>
root@neutron:~# <br>
<br>
root@neutron:~# ovs-ofctl show br-tun<br>
OFPT_FEATURES_REPLY (xid=0x2): dpid:0000826b075fdc46<br>
n_tables:254, n_buffers:256<br>
capabilities: FLOW_STATS TABLE_STATS PORT_STATS QUEUE_STATS
ARP_MATCH_IP<br>
actions: OUTPUT SET_VLAN_VID SET_VLAN_PCP STRIP_VLAN SET_DL_SRC
SET_DL_DST SET_NW_SRC SET_NW_DST SET_NW_TOS SET_TP_SRC
SET_TP_DST ENQUEUE<br>
1(patch-int): addr:76:05:50:ec:29:d2<br>
config: 0<br>
state: 0<br>
speed: 0 Mbps now, 0 Mbps max<br>
2(gre-1): addr:32:1d:80:12:a0:c1<br>
config: 0<br>
state: 0<br>
speed: 0 Mbps now, 0 Mbps max<br>
3(gre-2): addr:22:a5:5c:84:bf:66<br>
config: 0<br>
state: 0<br>
speed: 0 Mbps now, 0 Mbps max<br>
LOCAL(br-tun): addr:82:6b:07:5f:dc:46<br>
config: 0<br>
state: 0<br>
speed: 0 Mbps now, 0 Mbps max<br>
OFPT_GET_CONFIG_REPLY (xid=0x4): frags=normal miss_send_len=0<br>
root@neutron:~# <br>
<br>
root@neutron:~# ovs-appctl fdb/show br-int <br>
port VLAN MAC Age<br>
-1 1 fa:16:3e:94:4c:19 0<br>
-1 1 fa:16:3e:ba:22:ff 0<br>
root@neutron:~# <br>
<br>
root@neutron:~# ovs-ofctl dump-flows br-tun<br>
NXST_FLOW reply (xid=0x4):<br>
cookie=0x0, duration=50535.216s, table=0, n_packets=34524,
n_bytes=2156153, idle_age=1, priority=1,in_port=3
actions=resubmit(,2)<br>
cookie=0x0, duration=50536.204s, table=0, n_packets=31063,
n_bytes=1773195, idle_age=1, priority=1,in_port=1
actions=resubmit(,1)<br>
cookie=0x0, duration=50535.506s, table=0, n_packets=34472,
n_bytes=2188954, idle_age=1, priority=1,in_port=2
actions=resubmit(,2)<br>
cookie=0x0, duration=50536.169s, table=0, n_packets=4,
n_bytes=300, idle_age=50527, priority=0 actions=drop<br>
cookie=0x0, duration=50536.099s, table=1, n_packets=1880,
n_bytes=78960, idle_age=1,
priority=0,dl_dst=01:00:00:00:00:00/01:00:00:00:00:00
actions=resubmit(,21)<br>
cookie=0x0, duration=50536.134s, table=1, n_packets=29183,
n_bytes=1694235, idle_age=1016,
priority=0,dl_dst=00:00:00:00:00:00/01:00:00:00:00:00
actions=resubmit(,20)<br>
cookie=0x0, duration=50534.872s, table=2, n_packets=68996,
n_bytes=4345107, idle_age=1, priority=1,tun_id=0x1
actions=mod_vlan_vid:1,resubmit(,10)<br>
cookie=0x0, duration=50534.678s, table=2, n_packets=0,
n_bytes=0, idle_age=50534, priority=1,tun_id=0x2
actions=mod_vlan_vid:2,resubmit(,10)<br>
cookie=0x0, duration=50536.065s, table=2, n_packets=0,
n_bytes=0, idle_age=50536, priority=0 actions=drop<br>
cookie=0x0, duration=50536.03s, table=3, n_packets=0, n_bytes=0,
idle_age=50536, priority=0 actions=drop<br>
<br>
cookie=0x0, duration=50535.995s, table=10, n_packets=68996,
n_bytes=4345107, idle_age=1, priority=1
actions=learn(table=20,hard_timeout=300,priority=1,NXM_OF_VLAN_TCI[0..11],NXM_OF_ETH_DST[]=NXM_OF_ETH_SRC[],load:0->NXM_OF_VLAN_TCI[],load:NXM_NX_TUN_ID[]->NXM_NX_TUN_ID[],output:NXM_OF_IN_PORT[]),output:1<br>
<br>
This rule gets hit as number of packets increases. Its output
port is 1, it means packet should pass to patch-int and then to
qr-xxx. However we can not see any ARP reply on qr-xxx.<br>
<br>
cookie=0x0, duration=1113.575s, table=20, n_packets=5,
n_bytes=249, hard_timeout=300, idle_age=1106, hard_age=0,
priority=1,vlan_tci=0x0001/0x0fff,dl_dst=fa:16:3e:ba:22:ff
actions=load:0->NXM_OF_VLAN_TCI[],load:0x1->NXM_NX_TUN_ID[],output:3<br>
cookie=0x0, duration=1113.575s, table=20, n_packets=0,
n_bytes=0, hard_timeout=300, idle_age=1113, hard_age=0,
priority=1,vlan_tci=0x0001/0x0fff,dl_dst=fa:16:3e:94:4c:19
actions=load:0->NXM_OF_VLAN_TCI[],load:0x1->NXM_NX_TUN_ID[],output:2<br>
cookie=0x0, duration=50535.961s, table=20, n_packets=0,
n_bytes=0, idle_age=50535, priority=0 actions=resubmit(,21)<br>
cookie=0x0, duration=50534.907s, table=21, n_packets=1880,
n_bytes=78960, idle_age=1, priority=1,dl_vlan=1
actions=strip_vlan,set_tunnel:0x1,output:2,output:3<br>
cookie=0x0, duration=50534.713s, table=21, n_packets=0,
n_bytes=0, idle_age=50534, priority=1,dl_vlan=2
actions=strip_vlan,set_tunnel:0x2,output:2,output:3<br>
cookie=0x0, duration=50535.926s, table=21, n_packets=0,
n_bytes=0, idle_age=50535, priority=0 actions=drop<br>
<br>
Does anyone have any idea? Why flows on br-tun could not pass
ARP reply packets to qr-xxx.<br>
<br>
Any assistance you can provide would be greatly appreciated. <br>
</span></font><br>
-- Regards,
<br>
Rajshree
</body>
</html>