[Openstack-operators] OpenvSwitch Latency issues

Jacob Godin jacobgodin at gmail.com
Tue Oct 15 18:06:23 UTC 2013


Hi Jay,

If I stop and repeat immediately, ping times are fine. However, if I wait
5+ secs, they spike up during the first packet again.

I'm running OVS 1.4.0, someone recommended upgraded to 1.9.x from the
Havana repo.


On Tue, Oct 15, 2013 at 2:41 PM, Jay Pipes <jaypipes at gmail.com> wrote:

> Hi Jacob,
>
> What you are witnessing, I believe, is OVS "learning" the flows and MAC
> addresses of the various compute nodes involved in the communication path
> between the source and target interfaces.
>
> If you repeat the pings, do you see the same latency on the first ping?
>
> Best,
> -jay
>
>
> On 10/15/2013 10:37 AM, Jacob Godin wrote:
>
>> Hi folks,
>>
>> I'm experiencing a weird issue with OpenStack Networking + OpenvSwitch.
>> My setup consists of several compute nodes, and a networking node (l3,
>> OVS, dhcp, etc.). These are connected via a Gigabit switch, and it is no
>> where near capacity.
>>
>> It seems that the first packet being sent through a quantum router is
>> delayed by several hundred milliseconds. Here is some sample ping output:
>>
>> _VM(comp node 1) -> VM(comp node 2)_
>>
>>
>>     # ping 10.199.0.7
>>     PING 10.199.0.7 (10.199.0.7) 56(84) bytes of data.
>>     64 bytes from 10.199.0.7 <http://10.199.0.7>: icmp_seq=1 ttl=64
>>     time=3.45 ms
>>     64 bytes from 10.199.0.7 <http://10.199.0.7>: icmp_seq=2 ttl=64
>>     time=0.792 ms
>>     64 bytes from 10.199.0.7 <http://10.199.0.7>: icmp_seq=3 ttl=64
>>     time=0.837 ms
>>     64 bytes from 10.199.0.7 <http://10.199.0.7>: icmp_seq=4 ttl=64
>>     time=0.864 ms
>>
>> _VM -> qrouter_
>>
>>
>>     # ping 10.199.0.1
>>     PING 10.199.0.1 (10.199.0.1) 56(84) bytes of data.
>>     64 bytes from 10.199.0.1 <http://10.199.0.1>: icmp_seq=1 ttl=64
>>     time=248 ms
>>     64 bytes from 10.199.0.1 <http://10.199.0.1>: icmp_seq=2 ttl=64
>>     time=0.512 ms
>>     64 bytes from 10.199.0.1 <http://10.199.0.1>: icmp_seq=3 ttl=64
>>     time=0.553 ms
>>     64 bytes from 10.199.0.1 <http://10.199.0.1>: icmp_seq=4 ttl=64
>>     time=0.533 ms
>>     64 bytes from 10.199.0.1 <http://10.199.0.1>: icmp_seq=5 ttl=64
>>     time=0.679 ms
>>
>> _qrouter -> VM_
>>
>>
>>     # ip netns exec qrouter-XXXXX ping 10.199.0.7
>>     PING 10.199.0.7 (10.199.0.7) 56(84) bytes of data.
>>     64 bytes from 10.199.0.7 <http://10.199.0.7>: icmp_req=1 ttl=64
>>     time=576 ms
>>     64 bytes from 10.199.0.7 <http://10.199.0.7>: icmp_req=2 ttl=64
>>     time=0.530 ms
>>     64 bytes from 10.199.0.7 <http://10.199.0.7>: icmp_req=3 ttl=64
>>     time=0.597 ms
>>     64 bytes from 10.199.0.7 <http://10.199.0.7>: icmp_req=4 ttl=64
>>     time=0.723 ms
>>     64 bytes from 10.199.0.7 <http://10.199.0.7>: icmp_req=5 ttl=64
>>     time=0.677 ms
>>
>> _qrouter -> Internet_
>>
>>
>>     # ip netns exec qrouter-XXXXX ping 8.8.8.8
>>     PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.
>>     64 bytes from 8.8.8.8 <http://8.8.8.8>: icmp_req=1 ttl=43 time=267 ms
>>     64 bytes from 8.8.8.8 <http://8.8.8.8>: icmp_req=2 ttl=43 time=37.0
>> ms
>>     64 bytes from 8.8.8.8 <http://8.8.8.8>: icmp_req=3 ttl=43 time=37.2
>> ms
>>     64 bytes from 8.8.8.8 <http://8.8.8.8>: icmp_req=4 ttl=43 time=37.3
>> ms
>>
>>
>>
>> Here's a tcpdump on the qrouter of a ping from a vm on that network. It
>> doesn't appear to show the large delay:
>> 14:33:38.024040 fa:16:3e:36:8e:f2 > fa:16:3e:99:85:5d, ethertype IPv4
>> (0x0800), length 98: 10.199.0.4 > 10.199.0.1 <http://10.199.0.1>: ICMP
>>
>> echo request, id 29953, seq 1, length 64
>> 14:33:38.024089 fa:16:3e:99:85:5d > fa:16:3e:36:8e:f2, ethertype IPv4
>> (0x0800), length 98: 10.199.0.1 > 10.199.0.4 <http://10.199.0.4>: ICMP
>>
>> echo reply, id 29953, seq 1, length 64
>> 14:33:38.526725 fa:16:3e:36:8e:f2 > fa:16:3e:99:85:5d, ethertype IPv4
>> (0x0800), length 98: 10.199.0.4 > 10.199.0.1 <http://10.199.0.1>: ICMP
>>
>> echo request, id 29953, seq 2, length 64
>> 14:33:38.526781 fa:16:3e:99:85:5d > fa:16:3e:36:8e:f2, ethertype IPv4
>> (0x0800), length 98: 10.199.0.1 > 10.199.0.4 <http://10.199.0.4>: ICMP
>>
>> echo reply, id 29953, seq 2, length 64
>> 14:33:39.526943 fa:16:3e:36:8e:f2 > fa:16:3e:99:85:5d, ethertype IPv4
>> (0x0800), length 98: 10.199.0.4 > 10.199.0.1 <http://10.199.0.1>: ICMP
>>
>> echo request, id 29953, seq 3, length 64
>> 14:33:39.527000 fa:16:3e:99:85:5d > fa:16:3e:36:8e:f2, ethertype IPv4
>> (0x0800), length 98: 10.199.0.1 > 10.199.0.4 <http://10.199.0.4>: ICMP
>>
>> echo reply, id 29953, seq 3, length 64
>> 14:33:39.665664 fa:16:3e:61:ef:25 > ff:ff:ff:ff:ff:ff, ethertype ARP
>> (0x0806), length 42: Request who-has 10.199.0.7 tell 10.199.0.9, length 28
>> 14:33:40.526963 fa:16:3e:36:8e:f2 > fa:16:3e:99:85:5d, ethertype IPv4
>> (0x0800), length 98: 10.199.0.4 > 10.199.0.1 <http://10.199.0.1>: ICMP
>>
>> echo request, id 29953, seq 4, length 64
>> 14:33:40.527021 fa:16:3e:99:85:5d > fa:16:3e:36:8e:f2, ethertype IPv4
>> (0x0800), length 98: 10.199.0.1 > 10.199.0.4 <http://10.199.0.4>: ICMP
>>
>> echo reply, id 29953, seq 4, length 64
>>
>> And a dump from the VM performing the ping:
>> 14:34:59.897783 fa:16:3e:36:8e:f2 > fa:16:3e:99:85:5d, ethertype IPv4
>> (0x0800), length 98: 10.199.0.4 > 10.199.0.1 <http://10.199.0.1>: ICMP
>>
>> echo request, id 38145, seq 1, length 64
>> 14:35:00.897569 fa:16:3e:36:8e:f2 > fa:16:3e:99:85:5d, ethertype IPv4
>> (0x0800), length 98: 10.199.0.4 > 10.199.0.1 <http://10.199.0.1>: ICMP
>>
>> echo request, id 38145, seq 2, length 64
>> 14:35:01.260201 fa:16:3e:99:85:5d > fa:16:3e:36:8e:f2, ethertype IPv4
>> (0x0800), length 98: 10.199.0.1 > 10.199.0.4 <http://10.199.0.4>: ICMP
>>
>> echo reply, id 38145, seq 1, length 64
>> 14:35:01.260229 fa:16:3e:99:85:5d > fa:16:3e:36:8e:f2, ethertype IPv4
>> (0x0800), length 98: 10.199.0.1 > 10.199.0.4 <http://10.199.0.4>: ICMP
>>
>> echo reply, id 38145, seq 2, length 64
>>
>> So the router sees a sub-millisecond delay, while the VM sees a
>> significant delay (almost a second). This only happens during the first
>> packet, and then responses are sub 1ms.
>>
>> It appears to be an issue with the router, as delays are seem with both
>> internal and external traffic on the router itself. Any thoughts are
>> greatly appreciated!
>>
>>
>> ______________________________**_________________
>> OpenStack-operators mailing list
>> OpenStack-operators at lists.**openstack.org<OpenStack-operators at lists.openstack.org>
>> http://lists.openstack.org/**cgi-bin/mailman/listinfo/**
>> openstack-operators<http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators>
>>
>>
>
> ______________________________**_________________
> OpenStack-operators mailing list
> OpenStack-operators at lists.**openstack.org<OpenStack-operators at lists.openstack.org>
> http://lists.openstack.org/**cgi-bin/mailman/listinfo/**
> openstack-operators<http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-operators/attachments/20131015/9b48d171/attachment.html>


More information about the OpenStack-operators mailing list