[Openstack] Directional network performance issues with Neutron + OpenvSwitch

Martinx - ジェームズ thiagocmartinsc at gmail.com
Thu Oct 24 07:58:07 UTC 2013


Hi Aaron,

Thanks for answering!     =)

Lets work...

---

TEST #1 - iperf between Network Node and its Uplink router (Data Center's
gateway "Internet") - OVS br-ex / eth2

# Tenant Namespace route table

root at net-node-1:~# ip netns exec
qrouter-46cb8f7a-a3c5-4da7-ad69-4de63f7c34f1 ip route
default via 172.16.0.1 dev qg-50b615b7-c2
172.16.0.0/20 dev qg-50b615b7-c2  proto kernel  scope link  src 172.16.0.2
192.168.210.0/24 dev qr-a1376f61-05  proto kernel  scope link  src
192.168.210.1

# there is a "iperf -s" running at 172.16.0.1 "Internet", testing it

root at net-node-1:~# ip netns exec
qrouter-46cb8f7a-a3c5-4da7-ad69-4de63f7c34f1 iperf -c 172.16.0.1
------------------------------------------------------------
Client connecting to 172.16.0.1, TCP port 5001
TCP window size: 22.9 KByte (default)
------------------------------------------------------------
[  5] local 172.16.0.2 port 58342 connected with 172.16.0.1 port 5001
[ ID] Interval       Transfer     Bandwidth
[  5]  0.0-10.0 sec   668 MBytes   559 Mbits/sec
---

---

TEST #2 - iperf on one instance to the Namespace of the L3 agent + uplink
router

# iperf server running within Tenant's Namespace router

root at net-node-1:~# ip netns exec
qrouter-46cb8f7a-a3c5-4da7-ad69-4de63f7c34f1 iperf -s

-

# from instance-1

ubuntu at instance-1:~$ ip route
default via 192.168.210.1 dev eth0  metric 100
192.168.210.0/24 dev eth0  proto kernel  scope link  src 192.168.210.2

# instance-1 performing tests against net-node-1 Namespace above

ubuntu at instance-1:~$ iperf -c 192.168.210.1
------------------------------------------------------------
Client connecting to 192.168.210.1, TCP port 5001
TCP window size: 21.0 KByte (default)
------------------------------------------------------------
[  3] local 192.168.210.2 port 43739 connected with 192.168.210.1 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec   484 MBytes   406 Mbits/sec

# still on instance-1, now against "External IP" of its own Namespace /
Router

ubuntu at instance-1:~$ iperf -c 172.16.0.2
------------------------------------------------------------
Client connecting to 172.16.0.2, TCP port 5001
TCP window size: 21.0 KByte (default)
------------------------------------------------------------
[  3] local 192.168.210.2 port 34703 connected with 172.16.0.2 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec   520 MBytes   436 Mbits/sec

# still on instance-1, now against the Data Center UpLink Router

ubuntu at instance-1:~$ iperf -c 172.16.0.1
------------------------------------------------------------
Client connecting to 172.16.0.1, TCP port 5001
TCP window size: 21.0 KByte (default)
------------------------------------------------------------
[  3] local 192.168.210.4 port 38401 connected with 172.16.0.1 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec  * 324 MBytes   271 Mbits/sec*
---

This latest test shows only 271 Mbits/s! I think it should be at least,
400~430 MBits/s... Right?!

---

TEST #3 - Two instances on the same hypervisor

# iperf server

ubuntu at instance-2:~$ ip route
default via 192.168.210.1 dev eth0  metric 100
192.168.210.0/24 dev eth0  proto kernel  scope link  src 192.168.210.4

ubuntu at instance-2:~$ iperf -s
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 85.3 KByte (default)
------------------------------------------------------------
[  4] local 192.168.210.4 port 5001 connected with 192.168.210.2 port 45800
[ ID] Interval       Transfer     Bandwidth
[  4]  0.0-10.0 sec  4.61 GBytes  3.96 Gbits/sec

# iperf client

ubuntu at instance-1:~$ iperf -c 192.168.210.4
------------------------------------------------------------
Client connecting to 192.168.210.4, TCP port 5001
TCP window size: 21.0 KByte (default)
------------------------------------------------------------
[  3] local 192.168.210.2 port 45800 connected with 192.168.210.4 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec  4.61 GBytes  3.96 Gbits/sec
---

---

TEST #4 - Two instances on different hypervisors - over GRE

root at instance-2:~# iperf -s
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 85.3 KByte (default)
------------------------------------------------------------
[  4] local 192.168.210.4 port 5001 connected with 192.168.210.2 port 34640
[ ID] Interval       Transfer     Bandwidth
[  4]  0.0-10.0 sec   237 MBytes   198 Mbits/sec


root at instance-1:~# iperf -c 192.168.210.4
------------------------------------------------------------
Client connecting to 192.168.210.4, TCP port 5001
TCP window size: 21.0 KByte (default)
------------------------------------------------------------
[  3] local 192.168.210.2 port 34640 connected with 192.168.210.4 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec   237 MBytes   198 Mbits/sec
---

I just realized how slow is my intra-cloud (intra-VM) communication...   :-/

---

TEST #5 - Two hypervisors - "GRE TUNNEL LAN" - OVS local_ip / remote_ip

# Same path of "TEST #4" but, testing the physical GRE path (where GRE
traffic flows)

root at hypervisor-2:~$ iperf -s
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 85.3 KByte (default)
------------------------------------------------------------
n[  4] local 10.20.2.57 port 5001 connected with 10.20.2.53 port 51694
[ ID] Interval       Transfer     Bandwidth
[  4]  0.0-10.0 sec  1.09 GBytes   939 Mbits/sec

root at hypervisor-1:~# iperf -c 10.20.2.57
------------------------------------------------------------
Client connecting to 10.20.2.57, TCP port 5001
TCP window size: 22.9 KByte (default)
------------------------------------------------------------
[  3] local 10.20.2.53 port 51694 connected with 10.20.2.57 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec  1.09 GBytes   939 Mbits/sec
---

About Test #5, I don't know why the GRE traffic (Test #4) doesn't reach
1Gbit/sec (only ~200Mbit/s ?), since its physical path is much faster
(GIGALan). Plus, Test #3 shows a pretty fast speed when traffic flows only
within a hypervisor (3.96Gbit/sec).

Tomorrow, I'll do this tests with netperf.

NOTE: I'm using Open vSwitch 1.11.0, compiled for Ubuntu 12.04.3, via
"dpkg-buildpackage" and installed via "Debian / Ubuntu way". If I downgrade
to 1.10.2 from Havana Cloud Archive, same results... I can downgrade it, if
you guys tell me to do so.

BTW, I'll install another "Region", based on Havana on Ubuntu 13.10, with
exactly the same configurations from my current Havana + Ubuntu 12.04.3, on
top of the same hardware, to see if the problem still persist.

Regards,
Thiago

On 23 October 2013 22:40, Aaron Rosen <arosen at nicira.com> wrote:

>
>
>
> On Mon, Oct 21, 2013 at 11:52 PM, Martinx - ジェームズ <
> thiagocmartinsc at gmail.com> wrote:
>
>> James,
>>
>> I think I'm hitting this problem.
>>
>> I'm using "Per-Tenant Routers with Private Networks", GRE tunnels and
>> L3+DHCP Network Node.
>>
>> The connectivity from behind my Instances is very slow. It takes an
>> eternity to finish "apt-get update".
>>
>
>
> I'm curious if you can do the following tests to help pinpoint the bottle
> neck:
>
> Run iperf or netperf between:
> two instances on the same hypervisor - this will determine if it's a
> virtualization driver issue if the performance is bad.
> two instances on different hypervisors.
> one instance to the namespace of the l3 agent.
>
>
>
>
>
>
>>
>> If I run "apt-get update" from within tenant's Namespace, it goes fine.
>>
>> If I enable "ovs_use_veth", Metadata (and/or DHCP) stops working and I
>> and unable to start new Ubuntu Instances and login into them... Look:
>>
>> --
>> cloud-init start running: Tue, 22 Oct 2013 05:57:39 +0000. up 4.01 seconds
>> 2013-10-22 06:01:42,989 - util.py[WARNING]: '
>> http://169.254.169.254/2009-04-04/meta-data/instance-id' failed
>> [3/120s]: url error [[Errno 113] No route to host]
>> 2013-10-22 06:01:45,988 - util.py[WARNING]: '
>> http://169.254.169.254/2009-04-04/meta-data/instance-id' failed
>> [6/120s]: url error [[Errno 113] No route to host]
>> --
>>
>
>
> Do you see anything interesting in the neutron-metadata-agent log? Or it
> looks like your instance doesn't have a route to the default gw?
>
>
>>
>> Is this problem still around?!
>>
>> Should I stay away from GRE tunnels when with Havana + Ubuntu 12.04.3?
>>
>> Is it possible to re-enable Metadata when ovs_use_veth = true ?
>>
>> Thanks!
>> Thiago
>>
>>
>> On 3 October 2013 06:27, James Page <james.page at ubuntu.com> wrote:
>>
>>> -----BEGIN PGP SIGNED MESSAGE-----
>>> Hash: SHA256
>>>
>>> On 02/10/13 22:49, James Page wrote:
>>> >> sudo ip netns exec qrouter-d3baf1b1-55ee-42cb-a3f6-9629288e3221
>>> >>> traceroute -n 10.5.0.2 -p 44444 --mtu traceroute to 10.5.0.2
>>> >>> (10.5.0.2), 30 hops max, 65000 byte packets 1  10.5.0.2  0.950
>>> >>> ms F=1500  0.598 ms  0.566 ms
>>> >>>
>>> >>> The PMTU from the l3 gateway to the instance looks OK to me.
>>> > I spent a bit more time debugging this; performance from within
>>> > the router netns on the L3 gateway node looks good in both
>>> > directions when accessing via the tenant network (10.5.0.2) over
>>> > the qr-XXXXX interface, but when accessing through the external
>>> > network from within the netns I see the same performance choke
>>> > upstream into the tenant network.
>>> >
>>> > Which would indicate that my problem lies somewhere around the
>>> > qg-XXXXX interface in the router netns - just trying to figure out
>>> > exactly what - maybe iptables is doing something wonky?
>>>
>>> OK - I found a fix but I'm not sure why this makes a difference;
>>> neither my l3-agent or dhcp-agent configuration had 'ovs_use_veth =
>>> True'; I switched this on, clearing everything down, rebooted and now
>>> I seem symmetric good performance across all neutron routers.
>>>
>>> This would point to some sort of underlying bug when ovs_use_veth =
>>> False.
>>>
>>>
>>> - --
>>> James Page
>>> Ubuntu and Debian Developer
>>> james.page at ubuntu.com
>>> jamespage at debian.org
>>> -----BEGIN PGP SIGNATURE-----
>>> Version: GnuPG v1.4.14 (GNU/Linux)
>>> Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
>>>
>>> iQIcBAEBCAAGBQJSTTh6AAoJEL/srsug59jDmpEP/jaB5/yn9+Xm12XrVu0Q3IV5
>>> fLGOuBboUgykVVsfkWccI/oygNlBaXIcDuak/E4jxPcoRhLAdY1zpX8MQ8wSsGKd
>>> CjSeuW8xxnXubdfzmsCKSs3FCIBhDkSYzyiJd/raLvCfflyy8Cl7KN2x22mGHJ6z
>>> qZ9APcYfm9qCVbEssA3BHcUL+st1iqMJ0YhVZBk03+QEXaWu3FFbjpjwx3X1ZvV5
>>> Vbac7enqy7Lr4DSAIJVldeVuRURfv3YE3iJZTIXjaoUCCVTQLm5OmP9TrwBNHLsA
>>> 7W+LceQri+Vh0s4dHPKx5MiHsV3RCydcXkSQFYhx7390CXypMQ6WwXEY/a8Egssg
>>> SuxXByHwEcQFa+9sCwPQ+RXCmC0O6kUi8EPmwadjI5Gc1LoKw5Wov/SEen86fDUW
>>> P9pRXonseYyWN9I4MT4aG1ez8Dqq/SiZyWBHtcITxKI2smD92G9CwWGo4L9oGqJJ
>>> UcHRwQaTHgzy3yETPO25hjax8ZWZGNccHBixMCZKegr9p2dhR+7qF8G7mRtRQLxL
>>> 0fgOAExn/SX59ZT4RaYi9fI6Gng13RtSyI87CJC/50vfTmqoraUUK1aoSjIY4Dt+
>>> DYEMMLp205uLEj2IyaNTzykR0yh3t6dvfpCCcRA/xPT9slfa0a7P8LafyiWa4/5c
>>> jkJM4Y1BUV+2L5Rrf3sc
>>> =4lO4
>>> -----END PGP SIGNATURE-----
>>>
>>> _______________________________________________
>>> Mailing list:
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>>> Post to     : openstack at lists.openstack.org
>>> Unsubscribe :
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>>>
>>
>>
>> _______________________________________________
>> Mailing list:
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>> Post to     : openstack at lists.openstack.org
>> Unsubscribe :
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack/attachments/20131024/c1a5b098/attachment.html>


More information about the Openstack mailing list