[Openstack-operators] OpenvSwitch Latency issues

Jay Pipes jaypipes at gmail.com
Tue Oct 15 17:41:54 UTC 2013


Hi Jacob,

What you are witnessing, I believe, is OVS "learning" the flows and MAC 
addresses of the various compute nodes involved in the communication 
path between the source and target interfaces.

If you repeat the pings, do you see the same latency on the first ping?

Best,
-jay

On 10/15/2013 10:37 AM, Jacob Godin wrote:
> Hi folks,
>
> I'm experiencing a weird issue with OpenStack Networking + OpenvSwitch.
> My setup consists of several compute nodes, and a networking node (l3,
> OVS, dhcp, etc.). These are connected via a Gigabit switch, and it is no
> where near capacity.
>
> It seems that the first packet being sent through a quantum router is
> delayed by several hundred milliseconds. Here is some sample ping output:
>
> _VM(comp node 1) -> VM(comp node 2)_
>
>     # ping 10.199.0.7
>     PING 10.199.0.7 (10.199.0.7) 56(84) bytes of data.
>     64 bytes from 10.199.0.7 <http://10.199.0.7>: icmp_seq=1 ttl=64
>     time=3.45 ms
>     64 bytes from 10.199.0.7 <http://10.199.0.7>: icmp_seq=2 ttl=64
>     time=0.792 ms
>     64 bytes from 10.199.0.7 <http://10.199.0.7>: icmp_seq=3 ttl=64
>     time=0.837 ms
>     64 bytes from 10.199.0.7 <http://10.199.0.7>: icmp_seq=4 ttl=64
>     time=0.864 ms
>
> _VM -> qrouter_
>
>     # ping 10.199.0.1
>     PING 10.199.0.1 (10.199.0.1) 56(84) bytes of data.
>     64 bytes from 10.199.0.1 <http://10.199.0.1>: icmp_seq=1 ttl=64
>     time=248 ms
>     64 bytes from 10.199.0.1 <http://10.199.0.1>: icmp_seq=2 ttl=64
>     time=0.512 ms
>     64 bytes from 10.199.0.1 <http://10.199.0.1>: icmp_seq=3 ttl=64
>     time=0.553 ms
>     64 bytes from 10.199.0.1 <http://10.199.0.1>: icmp_seq=4 ttl=64
>     time=0.533 ms
>     64 bytes from 10.199.0.1 <http://10.199.0.1>: icmp_seq=5 ttl=64
>     time=0.679 ms
>
> _qrouter -> VM_
>
>     # ip netns exec qrouter-XXXXX ping 10.199.0.7
>     PING 10.199.0.7 (10.199.0.7) 56(84) bytes of data.
>     64 bytes from 10.199.0.7 <http://10.199.0.7>: icmp_req=1 ttl=64
>     time=576 ms
>     64 bytes from 10.199.0.7 <http://10.199.0.7>: icmp_req=2 ttl=64
>     time=0.530 ms
>     64 bytes from 10.199.0.7 <http://10.199.0.7>: icmp_req=3 ttl=64
>     time=0.597 ms
>     64 bytes from 10.199.0.7 <http://10.199.0.7>: icmp_req=4 ttl=64
>     time=0.723 ms
>     64 bytes from 10.199.0.7 <http://10.199.0.7>: icmp_req=5 ttl=64
>     time=0.677 ms
>
> _qrouter -> Internet_
>
>     # ip netns exec qrouter-XXXXX ping 8.8.8.8
>     PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.
>     64 bytes from 8.8.8.8 <http://8.8.8.8>: icmp_req=1 ttl=43 time=267 ms
>     64 bytes from 8.8.8.8 <http://8.8.8.8>: icmp_req=2 ttl=43 time=37.0 ms
>     64 bytes from 8.8.8.8 <http://8.8.8.8>: icmp_req=3 ttl=43 time=37.2 ms
>     64 bytes from 8.8.8.8 <http://8.8.8.8>: icmp_req=4 ttl=43 time=37.3 ms
>
>
> Here's a tcpdump on the qrouter of a ping from a vm on that network. It
> doesn't appear to show the large delay:
> 14:33:38.024040 fa:16:3e:36:8e:f2 > fa:16:3e:99:85:5d, ethertype IPv4
> (0x0800), length 98: 10.199.0.4 > 10.199.0.1 <http://10.199.0.1>: ICMP
> echo request, id 29953, seq 1, length 64
> 14:33:38.024089 fa:16:3e:99:85:5d > fa:16:3e:36:8e:f2, ethertype IPv4
> (0x0800), length 98: 10.199.0.1 > 10.199.0.4 <http://10.199.0.4>: ICMP
> echo reply, id 29953, seq 1, length 64
> 14:33:38.526725 fa:16:3e:36:8e:f2 > fa:16:3e:99:85:5d, ethertype IPv4
> (0x0800), length 98: 10.199.0.4 > 10.199.0.1 <http://10.199.0.1>: ICMP
> echo request, id 29953, seq 2, length 64
> 14:33:38.526781 fa:16:3e:99:85:5d > fa:16:3e:36:8e:f2, ethertype IPv4
> (0x0800), length 98: 10.199.0.1 > 10.199.0.4 <http://10.199.0.4>: ICMP
> echo reply, id 29953, seq 2, length 64
> 14:33:39.526943 fa:16:3e:36:8e:f2 > fa:16:3e:99:85:5d, ethertype IPv4
> (0x0800), length 98: 10.199.0.4 > 10.199.0.1 <http://10.199.0.1>: ICMP
> echo request, id 29953, seq 3, length 64
> 14:33:39.527000 fa:16:3e:99:85:5d > fa:16:3e:36:8e:f2, ethertype IPv4
> (0x0800), length 98: 10.199.0.1 > 10.199.0.4 <http://10.199.0.4>: ICMP
> echo reply, id 29953, seq 3, length 64
> 14:33:39.665664 fa:16:3e:61:ef:25 > ff:ff:ff:ff:ff:ff, ethertype ARP
> (0x0806), length 42: Request who-has 10.199.0.7 tell 10.199.0.9, length 28
> 14:33:40.526963 fa:16:3e:36:8e:f2 > fa:16:3e:99:85:5d, ethertype IPv4
> (0x0800), length 98: 10.199.0.4 > 10.199.0.1 <http://10.199.0.1>: ICMP
> echo request, id 29953, seq 4, length 64
> 14:33:40.527021 fa:16:3e:99:85:5d > fa:16:3e:36:8e:f2, ethertype IPv4
> (0x0800), length 98: 10.199.0.1 > 10.199.0.4 <http://10.199.0.4>: ICMP
> echo reply, id 29953, seq 4, length 64
>
> And a dump from the VM performing the ping:
> 14:34:59.897783 fa:16:3e:36:8e:f2 > fa:16:3e:99:85:5d, ethertype IPv4
> (0x0800), length 98: 10.199.0.4 > 10.199.0.1 <http://10.199.0.1>: ICMP
> echo request, id 38145, seq 1, length 64
> 14:35:00.897569 fa:16:3e:36:8e:f2 > fa:16:3e:99:85:5d, ethertype IPv4
> (0x0800), length 98: 10.199.0.4 > 10.199.0.1 <http://10.199.0.1>: ICMP
> echo request, id 38145, seq 2, length 64
> 14:35:01.260201 fa:16:3e:99:85:5d > fa:16:3e:36:8e:f2, ethertype IPv4
> (0x0800), length 98: 10.199.0.1 > 10.199.0.4 <http://10.199.0.4>: ICMP
> echo reply, id 38145, seq 1, length 64
> 14:35:01.260229 fa:16:3e:99:85:5d > fa:16:3e:36:8e:f2, ethertype IPv4
> (0x0800), length 98: 10.199.0.1 > 10.199.0.4 <http://10.199.0.4>: ICMP
> echo reply, id 38145, seq 2, length 64
>
> So the router sees a sub-millisecond delay, while the VM sees a
> significant delay (almost a second). This only happens during the first
> packet, and then responses are sub 1ms.
>
> It appears to be an issue with the router, as delays are seem with both
> internal and external traffic on the router itself. Any thoughts are
> greatly appreciated!
>
>
> _______________________________________________
> OpenStack-operators mailing list
> OpenStack-operators at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>




More information about the OpenStack-operators mailing list