[Openstack-operators] OpenvSwitch Latency issues

Narayan Desai narayan.desai at gmail.com
Tue Oct 15 19:53:23 UTC 2013


Hm, I'm not sure what is going on here. It looks like the problem is on
qrouter ingress/egress. I'm still not sure if that is necessarily OVS,
since the vm1 to vm2 path will have OVS in the path as well, right?

We saw some weird stuff which was caused by a network link flapping
quickly, but I'm sure that is what is going on here.
 -nld


On Tue, Oct 15, 2013 at 2:37 PM, Jacob Godin <jacobgodin at gmail.com> wrote:

> The tenant router has the initial delay to/from both external and internal.
>
>
> On Tue, Oct 15, 2013 at 4:29 PM, Narayan Desai <narayan.desai at gmail.com>wrote:
>
>> This sounds like a timer in the qrouter path, since you can get to the
>> tenant router with predictable, low latency, right?
>>
>> This is one of the big problems with network datapaths implemented fully
>> in software.
>>  -nld
>>
>>
>> On Tue, Oct 15, 2013 at 1:06 PM, Jacob Godin <jacobgodin at gmail.com>wrote:
>>
>>> Hi Jay,
>>>
>>> If I stop and repeat immediately, ping times are fine. However, if I
>>> wait 5+ secs, they spike up during the first packet again.
>>>
>>> I'm running OVS 1.4.0, someone recommended upgraded to 1.9.x from the
>>> Havana repo.
>>>
>>>
>>> On Tue, Oct 15, 2013 at 2:41 PM, Jay Pipes <jaypipes at gmail.com> wrote:
>>>
>>>> Hi Jacob,
>>>>
>>>> What you are witnessing, I believe, is OVS "learning" the flows and MAC
>>>> addresses of the various compute nodes involved in the communication path
>>>> between the source and target interfaces.
>>>>
>>>> If you repeat the pings, do you see the same latency on the first ping?
>>>>
>>>> Best,
>>>> -jay
>>>>
>>>>
>>>> On 10/15/2013 10:37 AM, Jacob Godin wrote:
>>>>
>>>>> Hi folks,
>>>>>
>>>>> I'm experiencing a weird issue with OpenStack Networking + OpenvSwitch.
>>>>> My setup consists of several compute nodes, and a networking node (l3,
>>>>> OVS, dhcp, etc.). These are connected via a Gigabit switch, and it is
>>>>> no
>>>>> where near capacity.
>>>>>
>>>>> It seems that the first packet being sent through a quantum router is
>>>>> delayed by several hundred milliseconds. Here is some sample ping
>>>>> output:
>>>>>
>>>>> _VM(comp node 1) -> VM(comp node 2)_
>>>>>
>>>>>
>>>>>     # ping 10.199.0.7
>>>>>     PING 10.199.0.7 (10.199.0.7) 56(84) bytes of data.
>>>>>     64 bytes from 10.199.0.7 <http://10.199.0.7>: icmp_seq=1 ttl=64
>>>>>     time=3.45 ms
>>>>>     64 bytes from 10.199.0.7 <http://10.199.0.7>: icmp_seq=2 ttl=64
>>>>>     time=0.792 ms
>>>>>     64 bytes from 10.199.0.7 <http://10.199.0.7>: icmp_seq=3 ttl=64
>>>>>     time=0.837 ms
>>>>>     64 bytes from 10.199.0.7 <http://10.199.0.7>: icmp_seq=4 ttl=64
>>>>>     time=0.864 ms
>>>>>
>>>>> _VM -> qrouter_
>>>>>
>>>>>
>>>>>     # ping 10.199.0.1
>>>>>     PING 10.199.0.1 (10.199.0.1) 56(84) bytes of data.
>>>>>     64 bytes from 10.199.0.1 <http://10.199.0.1>: icmp_seq=1 ttl=64
>>>>>     time=248 ms
>>>>>     64 bytes from 10.199.0.1 <http://10.199.0.1>: icmp_seq=2 ttl=64
>>>>>     time=0.512 ms
>>>>>     64 bytes from 10.199.0.1 <http://10.199.0.1>: icmp_seq=3 ttl=64
>>>>>     time=0.553 ms
>>>>>     64 bytes from 10.199.0.1 <http://10.199.0.1>: icmp_seq=4 ttl=64
>>>>>     time=0.533 ms
>>>>>     64 bytes from 10.199.0.1 <http://10.199.0.1>: icmp_seq=5 ttl=64
>>>>>     time=0.679 ms
>>>>>
>>>>> _qrouter -> VM_
>>>>>
>>>>>
>>>>>     # ip netns exec qrouter-XXXXX ping 10.199.0.7
>>>>>     PING 10.199.0.7 (10.199.0.7) 56(84) bytes of data.
>>>>>     64 bytes from 10.199.0.7 <http://10.199.0.7>: icmp_req=1 ttl=64
>>>>>     time=576 ms
>>>>>     64 bytes from 10.199.0.7 <http://10.199.0.7>: icmp_req=2 ttl=64
>>>>>     time=0.530 ms
>>>>>     64 bytes from 10.199.0.7 <http://10.199.0.7>: icmp_req=3 ttl=64
>>>>>     time=0.597 ms
>>>>>     64 bytes from 10.199.0.7 <http://10.199.0.7>: icmp_req=4 ttl=64
>>>>>     time=0.723 ms
>>>>>     64 bytes from 10.199.0.7 <http://10.199.0.7>: icmp_req=5 ttl=64
>>>>>     time=0.677 ms
>>>>>
>>>>> _qrouter -> Internet_
>>>>>
>>>>>
>>>>>     # ip netns exec qrouter-XXXXX ping 8.8.8.8
>>>>>     PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.
>>>>>     64 bytes from 8.8.8.8 <http://8.8.8.8>: icmp_req=1 ttl=43
>>>>> time=267 ms
>>>>>     64 bytes from 8.8.8.8 <http://8.8.8.8>: icmp_req=2 ttl=43
>>>>> time=37.0 ms
>>>>>     64 bytes from 8.8.8.8 <http://8.8.8.8>: icmp_req=3 ttl=43
>>>>> time=37.2 ms
>>>>>     64 bytes from 8.8.8.8 <http://8.8.8.8>: icmp_req=4 ttl=43
>>>>> time=37.3 ms
>>>>>
>>>>>
>>>>>
>>>>> Here's a tcpdump on the qrouter of a ping from a vm on that network. It
>>>>> doesn't appear to show the large delay:
>>>>> 14:33:38.024040 fa:16:3e:36:8e:f2 > fa:16:3e:99:85:5d, ethertype IPv4
>>>>> (0x0800), length 98: 10.199.0.4 > 10.199.0.1 <http://10.199.0.1>: ICMP
>>>>>
>>>>> echo request, id 29953, seq 1, length 64
>>>>> 14:33:38.024089 fa:16:3e:99:85:5d > fa:16:3e:36:8e:f2, ethertype IPv4
>>>>> (0x0800), length 98: 10.199.0.1 > 10.199.0.4 <http://10.199.0.4>: ICMP
>>>>>
>>>>> echo reply, id 29953, seq 1, length 64
>>>>> 14:33:38.526725 fa:16:3e:36:8e:f2 > fa:16:3e:99:85:5d, ethertype IPv4
>>>>> (0x0800), length 98: 10.199.0.4 > 10.199.0.1 <http://10.199.0.1>: ICMP
>>>>>
>>>>> echo request, id 29953, seq 2, length 64
>>>>> 14:33:38.526781 fa:16:3e:99:85:5d > fa:16:3e:36:8e:f2, ethertype IPv4
>>>>> (0x0800), length 98: 10.199.0.1 > 10.199.0.4 <http://10.199.0.4>: ICMP
>>>>>
>>>>> echo reply, id 29953, seq 2, length 64
>>>>> 14:33:39.526943 fa:16:3e:36:8e:f2 > fa:16:3e:99:85:5d, ethertype IPv4
>>>>> (0x0800), length 98: 10.199.0.4 > 10.199.0.1 <http://10.199.0.1>: ICMP
>>>>>
>>>>> echo request, id 29953, seq 3, length 64
>>>>> 14:33:39.527000 fa:16:3e:99:85:5d > fa:16:3e:36:8e:f2, ethertype IPv4
>>>>> (0x0800), length 98: 10.199.0.1 > 10.199.0.4 <http://10.199.0.4>: ICMP
>>>>>
>>>>> echo reply, id 29953, seq 3, length 64
>>>>> 14:33:39.665664 fa:16:3e:61:ef:25 > ff:ff:ff:ff:ff:ff, ethertype ARP
>>>>> (0x0806), length 42: Request who-has 10.199.0.7 tell 10.199.0.9,
>>>>> length 28
>>>>> 14:33:40.526963 fa:16:3e:36:8e:f2 > fa:16:3e:99:85:5d, ethertype IPv4
>>>>> (0x0800), length 98: 10.199.0.4 > 10.199.0.1 <http://10.199.0.1>: ICMP
>>>>>
>>>>> echo request, id 29953, seq 4, length 64
>>>>> 14:33:40.527021 fa:16:3e:99:85:5d > fa:16:3e:36:8e:f2, ethertype IPv4
>>>>> (0x0800), length 98: 10.199.0.1 > 10.199.0.4 <http://10.199.0.4>: ICMP
>>>>>
>>>>> echo reply, id 29953, seq 4, length 64
>>>>>
>>>>> And a dump from the VM performing the ping:
>>>>> 14:34:59.897783 fa:16:3e:36:8e:f2 > fa:16:3e:99:85:5d, ethertype IPv4
>>>>> (0x0800), length 98: 10.199.0.4 > 10.199.0.1 <http://10.199.0.1>: ICMP
>>>>>
>>>>> echo request, id 38145, seq 1, length 64
>>>>> 14:35:00.897569 fa:16:3e:36:8e:f2 > fa:16:3e:99:85:5d, ethertype IPv4
>>>>> (0x0800), length 98: 10.199.0.4 > 10.199.0.1 <http://10.199.0.1>: ICMP
>>>>>
>>>>> echo request, id 38145, seq 2, length 64
>>>>> 14:35:01.260201 fa:16:3e:99:85:5d > fa:16:3e:36:8e:f2, ethertype IPv4
>>>>> (0x0800), length 98: 10.199.0.1 > 10.199.0.4 <http://10.199.0.4>: ICMP
>>>>>
>>>>> echo reply, id 38145, seq 1, length 64
>>>>> 14:35:01.260229 fa:16:3e:99:85:5d > fa:16:3e:36:8e:f2, ethertype IPv4
>>>>> (0x0800), length 98: 10.199.0.1 > 10.199.0.4 <http://10.199.0.4>: ICMP
>>>>>
>>>>> echo reply, id 38145, seq 2, length 64
>>>>>
>>>>> So the router sees a sub-millisecond delay, while the VM sees a
>>>>> significant delay (almost a second). This only happens during the first
>>>>> packet, and then responses are sub 1ms.
>>>>>
>>>>> It appears to be an issue with the router, as delays are seem with both
>>>>> internal and external traffic on the router itself. Any thoughts are
>>>>> greatly appreciated!
>>>>>
>>>>>
>>>>> ______________________________**_________________
>>>>> OpenStack-operators mailing list
>>>>> OpenStack-operators at lists.**openstack.org<OpenStack-operators at lists.openstack.org>
>>>>> http://lists.openstack.org/**cgi-bin/mailman/listinfo/**
>>>>> openstack-operators<http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators>
>>>>>
>>>>>
>>>>
>>>> ______________________________**_________________
>>>> OpenStack-operators mailing list
>>>> OpenStack-operators at lists.**openstack.org<OpenStack-operators at lists.openstack.org>
>>>> http://lists.openstack.org/**cgi-bin/mailman/listinfo/**
>>>> openstack-operators<http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators>
>>>>
>>>
>>>
>>> _______________________________________________
>>> OpenStack-operators mailing list
>>> OpenStack-operators at lists.openstack.org
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-operators/attachments/20131015/b3360bdf/attachment.html>


More information about the OpenStack-operators mailing list