[Openstack] Intermittent network connection issue with OVS 1.10.2

Trump.Zhang zhangleiqiang at gmail.com
Thu Apr 24 13:52:59 UTC 2014


Hi, Rajshree:

    As far as I know, the ovs package on HAVANA RDO repo has some problem
which may result in network input performance ok  but very bad output
performance.

    I don't know if it is the same situation as yours, you can reference
this: http://openvswitch.org/pipermail/discuss/2013-April/009737.html

    Hope it helpful.



2014-04-24 18:32 GMT+08:00 Rajshree Thorat <rajshree.thorat at gslab.com>:

>  Hi All,
>
> I have deployed OpenStack Multi node Havana for provisioning VMs using
> ESXi as hypervisor.
> However, OpenStack instances looses network connectivity every so often.
>
> We could see the ARP reply on the GRE tunnel on the network node, but we
> don't see it in the tcpdump
> on the 'qr-xxx' interface of the qrouter namespace.For some strange
> reason, br-tun does not
> pass ARP reply to qr-xxx on network node.
>
> Please find more details below.
>
> ESXi version : 5.5.0
> OVS version : 1.10.2
>
> compute1 : 172.16.39.156
> compute2 : 172.16.39.155
> neutron node : 172.16.39.200
>
> Please find below packet flow when we ping VM from external world and VM
> is not reachable.
>
> 10.230.39.163 is floating and 10.10.10.2 is private IP of VM.
>
> C:\> ping 10.230.39.163 -t
>
> Pinging 10.230.39.163 with 32 bytes of data:
>
> Reply from 10.230.39.163: Destination host unreachable.
> Reply from 10.230.39.163: Destination host unreachable.
>
> root at compute1:~#tcpdump -n -i eth0 proto gre
> 17:07:57.541440 IP 172.16.39.155 > 172.16.39.156: GREv0, key=0x1, length
> 54: ARP, Request who-has 10.10.10.2 tell 10.10.10.1, length 28
> 17:07:57.541457 IP 172.16.39.156 > 172.16.39.200: GREv0, key=0x1, length
> 54: ARP, Request who-has 10.10.10.2 tell 10.10.10.1, length 28
> 17:07:57.541465 IP 172.16.39.156 > 172.16.39.200: GREv0, key=0x1, length
> 72: ARP, Reply 10.10.10.2 is-at fa:16:3e:ba:22:ff, length 46
> 17:07:57.541486 IP 172.16.39.156 > 172.16.39.200: GREv0, key=0x1, length
> 72: ARP, Reply 10.10.10.2 is-at fa:16:3e:ba:22:ff, length 46
>
> Here we can see compute node is sending ARP reply to network node over gre
> tunnel.
>
> root at neutron:~#tcpdump -n -i eth0 proto gre
> 17:08:55.281644 IP 172.16.39.156 > 172.16.39.200: GREv0, key=0x1, length
> 54: ARP, Request who-has 10.10.10.2 tell 10.10.10.1, length 28
> 17:08:55.281663 IP 172.16.39.155 > 172.16.39.200: GREv0, key=0x1, length
> 54: ARP, Request who-has 10.10.10.2 tell 10.10.10.1, length 28
> 17:08:55.281669 IP 172.16.39.156 > 172.16.39.200: GREv0, key=0x1, length
> 72: ARP, Reply 10.10.10.2 is-at fa:16:3e:ba:22:ff, length 46
>
> Here we can see network node is receiving ARP reply from compute node over
> gre tunnel.
>
> root at neutron:~# tcpdump -i br-int
> tcpdump: WARNING: br-int: no IPv4 address assigned
> tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
> listening on br-int, link-type EN10MB (Ethernet), capture size 65535 bytes
> 17:23:45.344970 ARP, Request who-has 10.10.10.2 tell 10.10.10.1, length 28
> 17:23:45.345121 ARP, Request who-has 10.10.10.2 tell 10.10.10.1, length 28
>
> We can see only ARP request on br-int of network node. There is no ARP
> reply on br-int.
>
> root at neutron:~# ip netns exec
> qrouter-905087ce-b1e2-4038-beae-32865fa7924b tcpdump -i qr-8d3a43d5-f5
> tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
> listening on qr-8d3a43d5-f5, link-type EN10MB (Ethernet), capture size
> 65535 bytes
> ^C23:21:20.950674 ARP, Request who-has 10.10.10.6 tell headnode.hpc.hpc,
> length 28
> 23:21:20.951224 ARP, Request who-has 10.10.10.6 tell headnode.hpc.hpc,
> length 28
> 23:21:20.951237 ARP, Request who-has 10.10.10.6 tell headnode.hpc.hpc,
> length 28
> 23:21:21.949476 ARP, Request who-has 10.10.10.6 tell headnode.hpc.hpc,
> length 28
> 23:21:21.949622 ARP, Request who-has 10.10.10.6 tell headnode.hpc.hpc,
> length 28
>
> Here we could not see any ARP reply on qr-xxx.
>
> root at neutron:~# ovs-vsctl show
> 1f793d03-8ab1-495c-877e-e6002dda9912
> Bridge br-ex
> Port br-ex
> Interface br-ex
> type: internal
> Port "eth2"
> Interface "eth2"
> Port "qg-ac77add8-f6"
> Interface "qg-ac77add8-f6"
> type: internal
> Bridge br-tun
> Port "gre-1"
> Interface "gre-1"
> type: gre
> options: {in_key=flow, local_ip="172.16.39.200", out_key=flow,
> remote_ip="172.16.39.155"}
> Port br-tun
> Interface br-tun
> type: internal
> Port "gre-2"
> Interface "gre-2"
> type: gre
> options: {in_key=flow, local_ip="172.16.39.200", out_key=flow,
> remote_ip="172.16.39.156"}
> Port patch-int
> Interface patch-int
> type: patch
> options: {peer=patch-tun}
> Bridge br-int
> Port "tapfc47451e-6e"
> tag: 2
> Interface "tapfc47451e-6e"
> type: internal
> Port patch-tun
> Interface patch-tun
> type: patch
> options: {peer=patch-int}
> Port br-int
> Interface br-int
> type: internal
> Port "qr-9362c080-49"
> tag: 1
> Interface "qr-9362c080-49"
> type: internal
> Port "tap494d9d45-de"
> tag: 1
> Interface "tap494d9d45-de"
> type: internal
> ovs_version: "1.10.2"
> root at neutron:~#
>
> root at neutron:~# ovs-ofctl show br-tun
> OFPT_FEATURES_REPLY (xid=0x2): dpid:0000826b075fdc46
> n_tables:254, n_buffers:256
> capabilities: FLOW_STATS TABLE_STATS PORT_STATS QUEUE_STATS ARP_MATCH_IP
> actions: OUTPUT SET_VLAN_VID SET_VLAN_PCP STRIP_VLAN SET_DL_SRC SET_DL_DST
> SET_NW_SRC SET_NW_DST SET_NW_TOS SET_TP_SRC SET_TP_DST ENQUEUE
> 1(patch-int): addr:76:05:50:ec:29:d2
> config: 0
> state: 0
> speed: 0 Mbps now, 0 Mbps max
> 2(gre-1): addr:32:1d:80:12:a0:c1
> config: 0
> state: 0
> speed: 0 Mbps now, 0 Mbps max
> 3(gre-2): addr:22:a5:5c:84:bf:66
> config: 0
> state: 0
> speed: 0 Mbps now, 0 Mbps max
> LOCAL(br-tun): addr:82:6b:07:5f:dc:46
> config: 0
> state: 0
> speed: 0 Mbps now, 0 Mbps max
> OFPT_GET_CONFIG_REPLY (xid=0x4): frags=normal miss_send_len=0
> root at neutron:~#
>
> root at neutron:~# ovs-appctl fdb/show br-int
> port VLAN MAC Age
> -1 1 fa:16:3e:94:4c:19 0
> -1 1 fa:16:3e:ba:22:ff 0
> root at neutron:~#
>
> root at neutron:~# ovs-ofctl dump-flows br-tun
> NXST_FLOW reply (xid=0x4):
> cookie=0x0, duration=50535.216s, table=0, n_packets=34524,
> n_bytes=2156153, idle_age=1, priority=1,in_port=3 actions=resubmit(,2)
> cookie=0x0, duration=50536.204s, table=0, n_packets=31063,
> n_bytes=1773195, idle_age=1, priority=1,in_port=1 actions=resubmit(,1)
> cookie=0x0, duration=50535.506s, table=0, n_packets=34472,
> n_bytes=2188954, idle_age=1, priority=1,in_port=2 actions=resubmit(,2)
> cookie=0x0, duration=50536.169s, table=0, n_packets=4, n_bytes=300,
> idle_age=50527, priority=0 actions=drop
> cookie=0x0, duration=50536.099s, table=1, n_packets=1880, n_bytes=78960,
> idle_age=1, priority=0,dl_dst=01:00:00:00:00:00/01:00:00:00:00:00
> actions=resubmit(,21)
> cookie=0x0, duration=50536.134s, table=1, n_packets=29183,
> n_bytes=1694235, idle_age=1016,
> priority=0,dl_dst=00:00:00:00:00:00/01:00:00:00:00:00 actions=resubmit(,20)
> cookie=0x0, duration=50534.872s, table=2, n_packets=68996,
> n_bytes=4345107, idle_age=1, priority=1,tun_id=0x1
> actions=mod_vlan_vid:1,resubmit(,10)
> cookie=0x0, duration=50534.678s, table=2, n_packets=0, n_bytes=0,
> idle_age=50534, priority=1,tun_id=0x2 actions=mod_vlan_vid:2,resubmit(,10)
> cookie=0x0, duration=50536.065s, table=2, n_packets=0, n_bytes=0,
> idle_age=50536, priority=0 actions=drop
> cookie=0x0, duration=50536.03s, table=3, n_packets=0, n_bytes=0,
> idle_age=50536, priority=0 actions=drop
>
> cookie=0x0, duration=50535.995s, table=10, n_packets=68996,
> n_bytes=4345107, idle_age=1, priority=1
> actions=learn(table=20,hard_timeout=300,priority=1,NXM_OF_VLAN_TCI[0..11],NXM_OF_ETH_DST[]=NXM_OF_ETH_SRC[],load:0->NXM_OF_VLAN_TCI[],load:NXM_NX_TUN_ID[]->NXM_NX_TUN_ID[],output:NXM_OF_IN_PORT[]),output:1
>
> This rule gets hit as number of packets increases. Its output port is 1,
> it means packet should pass to patch-int and then to qr-xxx. However we can
> not see any ARP reply on qr-xxx.
>
> cookie=0x0, duration=1113.575s, table=20, n_packets=5, n_bytes=249,
> hard_timeout=300, idle_age=1106, hard_age=0,
> priority=1,vlan_tci=0x0001/0x0fff,dl_dst=fa:16:3e:ba:22:ff
> actions=load:0->NXM_OF_VLAN_TCI[],load:0x1->NXM_NX_TUN_ID[],output:3
> cookie=0x0, duration=1113.575s, table=20, n_packets=0, n_bytes=0,
> hard_timeout=300, idle_age=1113, hard_age=0,
> priority=1,vlan_tci=0x0001/0x0fff,dl_dst=fa:16:3e:94:4c:19
> actions=load:0->NXM_OF_VLAN_TCI[],load:0x1->NXM_NX_TUN_ID[],output:2
> cookie=0x0, duration=50535.961s, table=20, n_packets=0, n_bytes=0,
> idle_age=50535, priority=0 actions=resubmit(,21)
> cookie=0x0, duration=50534.907s, table=21, n_packets=1880, n_bytes=78960,
> idle_age=1, priority=1,dl_vlan=1
> actions=strip_vlan,set_tunnel:0x1,output:2,output:3
> cookie=0x0, duration=50534.713s, table=21, n_packets=0, n_bytes=0,
> idle_age=50534, priority=1,dl_vlan=2
> actions=strip_vlan,set_tunnel:0x2,output:2,output:3
> cookie=0x0, duration=50535.926s, table=21, n_packets=0, n_bytes=0,
> idle_age=50535, priority=0 actions=drop
>
> Does anyone have any idea? Why flows on br-tun could not pass ARP reply
> packets to qr-xxx.
>
> Any assistance you can provide would be greatly appreciated.
>
> -- Regards,
> Rajshree
>
> _______________________________________________
> Mailing list:
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
> Post to     : openstack at lists.openstack.org
> Unsubscribe :
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>
>


-- 
-------------------
Best Regards

Trump.Zhang
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack/attachments/20140424/5ca62800/attachment.html>


More information about the Openstack mailing list