[Openstack-operators] OpenvSwitch Latency issues

Jacob Godin jacobgodin at gmail.com
Tue Oct 15 14:37:59 UTC 2013


Hi folks,

I'm experiencing a weird issue with OpenStack Networking + OpenvSwitch. My
setup consists of several compute nodes, and a networking node (l3, OVS,
dhcp, etc.). These are connected via a Gigabit switch, and it is no where
near capacity.

It seems that the first packet being sent through a quantum router is
delayed by several hundred milliseconds. Here is some sample ping output:

*VM(comp node 1) -> VM(comp node 2)*

# ping 10.199.0.7
PING 10.199.0.7 (10.199.0.7) 56(84) bytes of data.
64 bytes from 10.199.0.7: icmp_seq=1 ttl=64 time=3.45 ms
64 bytes from 10.199.0.7: icmp_seq=2 ttl=64 time=0.792 ms
64 bytes from 10.199.0.7: icmp_seq=3 ttl=64 time=0.837 ms
64 bytes from 10.199.0.7: icmp_seq=4 ttl=64 time=0.864 ms

*VM -> qrouter*

# ping 10.199.0.1
PING 10.199.0.1 (10.199.0.1) 56(84) bytes of data.
64 bytes from 10.199.0.1: icmp_seq=1 ttl=64 time=248 ms
64 bytes from 10.199.0.1: icmp_seq=2 ttl=64 time=0.512 ms
64 bytes from 10.199.0.1: icmp_seq=3 ttl=64 time=0.553 ms
64 bytes from 10.199.0.1: icmp_seq=4 ttl=64 time=0.533 ms
64 bytes from 10.199.0.1: icmp_seq=5 ttl=64 time=0.679 ms

*qrouter -> VM*

# ip netns exec qrouter-XXXXX ping 10.199.0.7
PING 10.199.0.7 (10.199.0.7) 56(84) bytes of data.
64 bytes from 10.199.0.7: icmp_req=1 ttl=64 time=576 ms
64 bytes from 10.199.0.7: icmp_req=2 ttl=64 time=0.530 ms
64 bytes from 10.199.0.7: icmp_req=3 ttl=64 time=0.597 ms
64 bytes from 10.199.0.7: icmp_req=4 ttl=64 time=0.723 ms
64 bytes from 10.199.0.7: icmp_req=5 ttl=64 time=0.677 ms

*qrouter -> Internet*

# ip netns exec qrouter-XXXXX ping 8.8.8.8
PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.
64 bytes from 8.8.8.8: icmp_req=1 ttl=43 time=267 ms
64 bytes from 8.8.8.8: icmp_req=2 ttl=43 time=37.0 ms
64 bytes from 8.8.8.8: icmp_req=3 ttl=43 time=37.2 ms
64 bytes from 8.8.8.8: icmp_req=4 ttl=43 time=37.3 ms


Here's a tcpdump on the qrouter of a ping from a vm on that network. It
doesn't appear to show the large delay:
14:33:38.024040 fa:16:3e:36:8e:f2 > fa:16:3e:99:85:5d, ethertype IPv4
(0x0800), length 98: 10.199.0.4 > 10.199.0.1: ICMP echo request, id 29953,
seq 1, length 64
14:33:38.024089 fa:16:3e:99:85:5d > fa:16:3e:36:8e:f2, ethertype IPv4
(0x0800), length 98: 10.199.0.1 > 10.199.0.4: ICMP echo reply, id 29953,
seq 1, length 64
14:33:38.526725 fa:16:3e:36:8e:f2 > fa:16:3e:99:85:5d, ethertype IPv4
(0x0800), length 98: 10.199.0.4 > 10.199.0.1: ICMP echo request, id 29953,
seq 2, length 64
14:33:38.526781 fa:16:3e:99:85:5d > fa:16:3e:36:8e:f2, ethertype IPv4
(0x0800), length 98: 10.199.0.1 > 10.199.0.4: ICMP echo reply, id 29953,
seq 2, length 64
14:33:39.526943 fa:16:3e:36:8e:f2 > fa:16:3e:99:85:5d, ethertype IPv4
(0x0800), length 98: 10.199.0.4 > 10.199.0.1: ICMP echo request, id 29953,
seq 3, length 64
14:33:39.527000 fa:16:3e:99:85:5d > fa:16:3e:36:8e:f2, ethertype IPv4
(0x0800), length 98: 10.199.0.1 > 10.199.0.4: ICMP echo reply, id 29953,
seq 3, length 64
14:33:39.665664 fa:16:3e:61:ef:25 > ff:ff:ff:ff:ff:ff, ethertype ARP
(0x0806), length 42: Request who-has 10.199.0.7 tell 10.199.0.9, length 28
14:33:40.526963 fa:16:3e:36:8e:f2 > fa:16:3e:99:85:5d, ethertype IPv4
(0x0800), length 98: 10.199.0.4 > 10.199.0.1: ICMP echo request, id 29953,
seq 4, length 64
14:33:40.527021 fa:16:3e:99:85:5d > fa:16:3e:36:8e:f2, ethertype IPv4
(0x0800), length 98: 10.199.0.1 > 10.199.0.4: ICMP echo reply, id 29953,
seq 4, length 64

And a dump from the VM performing the ping:
14:34:59.897783 fa:16:3e:36:8e:f2 > fa:16:3e:99:85:5d, ethertype IPv4
(0x0800), length 98: 10.199.0.4 > 10.199.0.1: ICMP echo request, id 38145,
seq 1, length 64
14:35:00.897569 fa:16:3e:36:8e:f2 > fa:16:3e:99:85:5d, ethertype IPv4
(0x0800), length 98: 10.199.0.4 > 10.199.0.1: ICMP echo request, id 38145,
seq 2, length 64
14:35:01.260201 fa:16:3e:99:85:5d > fa:16:3e:36:8e:f2, ethertype IPv4
(0x0800), length 98: 10.199.0.1 > 10.199.0.4: ICMP echo reply, id 38145,
seq 1, length 64
14:35:01.260229 fa:16:3e:99:85:5d > fa:16:3e:36:8e:f2, ethertype IPv4
(0x0800), length 98: 10.199.0.1 > 10.199.0.4: ICMP echo reply, id 38145,
seq 2, length 64

So the router sees a sub-millisecond delay, while the VM sees a significant
delay (almost a second). This only happens during the first packet, and
then responses are sub 1ms.

It appears to be an issue with the router, as delays are seem with both
internal and external traffic on the router itself. Any thoughts are
greatly appreciated!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-operators/attachments/20131015/60dc3324/attachment.html>


More information about the OpenStack-operators mailing list