[Openstack] Directional network performance issues with Neutron + OpenvSwitch

Anne Gentle anne at openstack.org
Thu Oct 24 16:07:38 UTC 2013


On Thu, Oct 24, 2013 at 10:37 AM, Martinx - ジェームズ <thiagocmartinsc at gmail.com
> wrote:

> Precisely!
>
> The doc currently says to disable Namespace when using GRE, never did this
> before, look:
>
>
> http://docs.openstack.org/trunk/install-guide/install/apt/content/install-neutron.install-plugin.ovs.gre.html
>
> But on this very same doc, they say to enable it... Who knows?!   =P
>
>
> http://docs.openstack.org/trunk/install-guide/install/apt/content/section_networking-routers-with-private-networks.html
>
> I stick with Namespace enabled...
>
>
Just a reminder, /trunk/ links are works in progress, thanks for bringing
the mismatch to our attention, and we already have a doc bug filed:

https://bugs.launchpad.net/openstack-manuals/+bug/1241056

Review this patch: https://review.openstack.org/#/c/53380/

Anne




> Let me ask you something, when you enable ovs_use_veth, que Metadata and
> DHCP still works?!
>
> Cheers!
> Thiago
>
>
> On 24 October 2013 12:22, Speichert,Daniel <djs428 at drexel.edu> wrote:
>
>>  Hello everyone,****
>>
>> ** **
>>
>> It seems we also ran into the same issue.****
>>
>> ** **
>>
>> We are running Ubuntu Saucy with OpenStack Havana from Ubuntu Cloud
>> archives (precise-updates).****
>>
>> ** **
>>
>> The download speed to the VMs increased from 5 Mbps to maximum after
>> enabling ovs_use_veth. Upload speed from the VMs is still terrible (max
>> 1 Mbps, usually 0.04 Mbps).****
>>
>> ** **
>>
>> Here is the iperf between the instance and L3 agent (network node) inside
>> namespace.****
>>
>> ** **
>>
>> root at cloud:~# ip netns exec
>> qrouter-a29e0200-d390-40d1-8cf7-7ac1cef5863a  iperf -c 10.1.0.24 -r****
>>
>> ------------------------------------------------------------****
>>
>> Server listening on TCP port 5001****
>>
>> TCP window size: 85.3 KByte (default)****
>>
>> ------------------------------------------------------------****
>>
>> ------------------------------------------------------------****
>>
>> Client connecting to 10.1.0.24, TCP port 5001****
>>
>> TCP window size:  585 KByte (default)****
>>
>> ------------------------------------------------------------****
>>
>> [  7] local 10.1.0.1 port 37520 connected with 10.1.0.24 port 5001****
>>
>> [ ID] Interval       Transfer     Bandwidth****
>>
>> [  7]  0.0-10.0 sec   845 MBytes   708 Mbits/sec****
>>
>> [  6] local 10.1.0.1 port 5001 connected with 10.1.0.24 port 53006****
>>
>> [  6]  0.0-31.4 sec   256 KBytes  66.7 Kbits/sec****
>>
>> ** **
>>
>> We are using Neutron OpenVSwitch with GRE and namespaces.****
>>
>>
>> A side question: the documentation says to disable namespaces with GRE
>> and enable them with VLANs. It was always working well for us on Grizzly
>> with GRE and namespaces and we could never get it to work without
>> namespaces. Is there any specific reason why the documentation is advising
>> to disable it?****
>>
>> ** **
>>
>> Regards,****
>>
>> Daniel****
>>
>> ** **
>>
>> *From:* Martinx - ジェームズ [mailto:thiagocmartinsc at gmail.com]
>> *Sent:* Thursday, October 24, 2013 3:58 AM
>> *To:* Aaron Rosen
>> *Cc:* openstack at lists.openstack.org
>>
>> *Subject:* Re: [Openstack] Directional network performance issues with
>> Neutron + OpenvSwitch****
>>
>> ** **
>>
>> Hi Aaron,****
>>
>> ** **
>>
>> Thanks for answering!     =)****
>>
>> ** **
>>
>> Lets work...****
>>
>> ** **
>>
>> ---****
>>
>> ** **
>>
>> TEST #1 - iperf between Network Node and its Uplink router (Data Center's
>> gateway "Internet") - OVS br-ex / eth2****
>>
>> ** **
>>
>> # Tenant Namespace route table****
>>
>> ** **
>>
>> root at net-node-1:~# ip netns exec
>> qrouter-46cb8f7a-a3c5-4da7-ad69-4de63f7c34f1 ip route****
>>
>> default via 172.16.0.1 dev qg-50b615b7-c2 ****
>>
>> 172.16.0.0/20 dev qg-50b615b7-c2  proto kernel  scope link  src
>> 172.16.0.2 ****
>>
>> 192.168.210.0/24 dev qr-a1376f61-05  proto kernel  scope link  src
>> 192.168.210.1 ****
>>
>> ** **
>>
>> # there is a "iperf -s" running at 172.16.0.1 "Internet", testing it****
>>
>> ** **
>>
>> root at net-node-1:~# ip netns exec
>> qrouter-46cb8f7a-a3c5-4da7-ad69-4de63f7c34f1 iperf -c 172.16.0.1****
>>
>> ------------------------------------------------------------****
>>
>> Client connecting to 172.16.0.1, TCP port 5001****
>>
>> TCP window size: 22.9 KByte (default)****
>>
>> ------------------------------------------------------------****
>>
>> [  5] local 172.16.0.2 port 58342 connected with 172.16.0.1 port 5001****
>>
>> [ ID] Interval       Transfer     Bandwidth****
>>
>> [  5]  0.0-10.0 sec   668 MBytes   559 Mbits/sec****
>>
>> ---****
>>
>> ** **
>>
>> ---****
>>
>> ** **
>>
>> TEST #2 - iperf on one instance to the Namespace of the L3 agent + uplink
>> router****
>>
>> ** **
>>
>> # iperf server running within Tenant's Namespace router****
>>
>> ** **
>>
>> root at net-node-1:~# ip netns exec
>> qrouter-46cb8f7a-a3c5-4da7-ad69-4de63f7c34f1 iperf -s****
>>
>> ** **
>>
>> -****
>>
>> ** **
>>
>> # from instance-1****
>>
>> ** **
>>
>> ubuntu at instance-1:~$ ip route****
>>
>> default via 192.168.210.1 dev eth0  metric 100  ****
>>
>> 192.168.210.0/24 dev eth0  proto kernel  scope link  src 192.168.210.2 **
>> **
>>
>> ** **
>>
>> # instance-1 performing tests against net-node-1 Namespace above****
>>
>> ** **
>>
>> ubuntu at instance-1:~$ iperf -c 192.168.210.1****
>>
>> ------------------------------------------------------------****
>>
>> Client connecting to 192.168.210.1, TCP port 5001****
>>
>> TCP window size: 21.0 KByte (default)****
>>
>> ------------------------------------------------------------****
>>
>> [  3] local 192.168.210.2 port 43739 connected with 192.168.210.1 port
>> 5001****
>>
>> [ ID] Interval       Transfer     Bandwidth****
>>
>> [  3]  0.0-10.0 sec   484 MBytes   406 Mbits/sec****
>>
>> ** **
>>
>> # still on instance-1, now against "External IP" of its own Namespace /
>> Router****
>>
>> ** **
>>
>> ubuntu at instance-1:~$ iperf -c 172.16.0.2****
>>
>> ------------------------------------------------------------****
>>
>> Client connecting to 172.16.0.2, TCP port 5001****
>>
>> TCP window size: 21.0 KByte (default)****
>>
>> ------------------------------------------------------------****
>>
>> [  3] local 192.168.210.2 port 34703 connected with 172.16.0.2 port 5001*
>> ***
>>
>> [ ID] Interval       Transfer     Bandwidth****
>>
>> [  3]  0.0-10.0 sec   520 MBytes   436 Mbits/sec****
>>
>> ** **
>>
>> # still on instance-1, now against the Data Center UpLink Router****
>>
>> ** **
>>
>> ubuntu at instance-1:~$ iperf -c 172.16.0.1****
>>
>> ------------------------------------------------------------****
>>
>> Client connecting to 172.16.0.1, TCP port 5001****
>>
>> TCP window size: 21.0 KByte (default)****
>>
>> ------------------------------------------------------------****
>>
>> [  3] local 192.168.210.4 port 38401 connected with 172.16.0.1 port 5001*
>> ***
>>
>> [ ID] Interval       Transfer     Bandwidth****
>>
>> [  3]  0.0-10.0 sec  * 324 MBytes   271 Mbits/sec*****
>>
>> ---****
>>
>> ** **
>>
>> This latest test shows only 271 Mbits/s! I think it should be at least,
>> 400~430 MBits/s... Right?!****
>>
>> ** **
>>
>> ---****
>>
>> ** **
>>
>> TEST #3 - Two instances on the same hypervisor****
>>
>> ** **
>>
>> # iperf server****
>>
>> ** **
>>
>> ubuntu at instance-2:~$ ip route****
>>
>> default via 192.168.210.1 dev eth0  metric 100 ****
>>
>> 192.168.210.0/24 dev eth0  proto kernel  scope link  src 192.168.210.4 **
>> **
>>
>> ** **
>>
>> ubuntu at instance-2:~$ iperf -s****
>>
>> ------------------------------------------------------------****
>>
>> Server listening on TCP port 5001****
>>
>> TCP window size: 85.3 KByte (default)****
>>
>> ------------------------------------------------------------****
>>
>> [  4] local 192.168.210.4 port 5001 connected with 192.168.210.2 port
>> 45800****
>>
>> [ ID] Interval       Transfer     Bandwidth****
>>
>> [  4]  0.0-10.0 sec  4.61 GBytes  3.96 Gbits/sec****
>>
>> ** **
>>
>> # iperf client****
>>
>> ** **
>>
>> ubuntu at instance-1:~$ iperf -c 192.168.210.4****
>>
>> ------------------------------------------------------------****
>>
>> Client connecting to 192.168.210.4, TCP port 5001****
>>
>> TCP window size: 21.0 KByte (default)****
>>
>> ------------------------------------------------------------****
>>
>> [  3] local 192.168.210.2 port 45800 connected with 192.168.210.4 port
>> 5001****
>>
>> [ ID] Interval       Transfer     Bandwidth****
>>
>> [  3]  0.0-10.0 sec  4.61 GBytes  3.96 Gbits/sec****
>>
>> ---****
>>
>> ** **
>>
>> ---****
>>
>> ** **
>>
>> TEST #4 - Two instances on different hypervisors - over GRE****
>>
>> ** **
>>
>> root at instance-2:~# iperf -s****
>>
>> ------------------------------------------------------------****
>>
>> Server listening on TCP port 5001****
>>
>> TCP window size: 85.3 KByte (default)****
>>
>> ------------------------------------------------------------****
>>
>> [  4] local 192.168.210.4 port 5001 connected with 192.168.210.2 port
>> 34640****
>>
>> [ ID] Interval       Transfer     Bandwidth****
>>
>> [  4]  0.0-10.0 sec   237 MBytes   198 Mbits/sec****
>>
>> ** **
>>
>> ** **
>>
>> root at instance-1:~# iperf -c 192.168.210.4****
>>
>> ------------------------------------------------------------****
>>
>> Client connecting to 192.168.210.4, TCP port 5001****
>>
>> TCP window size: 21.0 KByte (default)****
>>
>> ------------------------------------------------------------****
>>
>> [  3] local 192.168.210.2 port 34640 connected with 192.168.210.4 port
>> 5001****
>>
>> [ ID] Interval       Transfer     Bandwidth****
>>
>> [  3]  0.0-10.0 sec   237 MBytes   198 Mbits/sec****
>>
>> ---****
>>
>> ** **
>>
>> I just realized how slow is my intra-cloud (intra-VM) communication...
>> :-/****
>>
>> ** **
>>
>> ---****
>>
>> ** **
>>
>> TEST #5 - Two hypervisors - "GRE TUNNEL LAN" - OVS local_ip / remote_ip**
>> **
>>
>> ** **
>>
>> # Same path of "TEST #4" but, testing the physical GRE path (where GRE
>> traffic flows)****
>>
>> ** **
>>
>> root at hypervisor-2:~$ iperf -s****
>>
>> ------------------------------------------------------------****
>>
>> Server listening on TCP port 5001****
>>
>> TCP window size: 85.3 KByte (default)****
>>
>> ------------------------------------------------------------****
>>
>> n[  4] local 10.20.2.57 port 5001 connected with 10.20.2.53 port 51694***
>> *
>>
>> [ ID] Interval       Transfer     Bandwidth****
>>
>> [  4]  0.0-10.0 sec  1.09 GBytes   939 Mbits/sec****
>>
>> ** **
>>
>> root at hypervisor-1:~# iperf -c 10.20.2.57****
>>
>> ------------------------------------------------------------****
>>
>> Client connecting to 10.20.2.57, TCP port 5001****
>>
>> TCP window size: 22.9 KByte (default)****
>>
>> ------------------------------------------------------------****
>>
>> [  3] local 10.20.2.53 port 51694 connected with 10.20.2.57 port 5001****
>>
>> [ ID] Interval       Transfer     Bandwidth****
>>
>> [  3]  0.0-10.0 sec  1.09 GBytes   939 Mbits/sec****
>>
>> ---****
>>
>> ** **
>>
>> About Test #5, I don't know why the GRE traffic (Test #4) doesn't reach
>> 1Gbit/sec (only ~200Mbit/s ?), since its physical path is much faster
>> (GIGALan). Plus, Test #3 shows a pretty fast speed when traffic flows only
>> within a hypervisor (3.96Gbit/sec).****
>>
>> ** **
>>
>> Tomorrow, I'll do this tests with netperf.****
>>
>> ** **
>>
>> NOTE: I'm using Open vSwitch 1.11.0, compiled for Ubuntu 12.04.3, via
>> "dpkg-buildpackage" and installed via "Debian / Ubuntu way". If I downgrade
>> to 1.10.2 from Havana Cloud Archive, same results... I can downgrade it, if
>> you guys tell me to do so.****
>>
>> ** **
>>
>> BTW, I'll install another "Region", based on Havana on Ubuntu 13.10, with
>> exactly the same configurations from my current Havana + Ubuntu 12.04.3, on
>> top of the same hardware, to see if the problem still persist.****
>>
>> ** **
>>
>> Regards,****
>>
>> Thiago****
>>
>> ** **
>>
>> On 23 October 2013 22:40, Aaron Rosen <arosen at nicira.com> wrote:****
>>
>>  ** **
>>
>> ** **
>>
>> On Mon, Oct 21, 2013 at 11:52 PM, Martinx - ジェームズ <
>> thiagocmartinsc at gmail.com> wrote:****
>>
>>  James,****
>>
>> ** **
>>
>> I think I'm hitting this problem.****
>>
>> ** **
>>
>> I'm using "Per-Tenant Routers with Private Networks", GRE tunnels and
>> L3+DHCP Network Node.****
>>
>> ** **
>>
>> The connectivity from behind my Instances is very slow. It takes an
>> eternity to finish "apt-get update".****
>>
>>  ** **
>>
>> ** **
>>
>> I'm curious if you can do the following tests to help pinpoint the bottle
>> neck: ****
>>
>> ** **
>>
>> Run iperf or netperf between:****
>>
>> two instances on the same hypervisor - this will determine if it's a
>> virtualization driver issue if the performance is bad. ****
>>
>> two instances on different hypervisors.****
>>
>> one instance to the namespace of the l3 agent. ****
>>
>> ** **
>>
>> ** **
>>
>> ** **
>>
>> ** **
>>
>>  ****
>>
>>  ** **
>>
>> If I run "apt-get update" from within tenant's Namespace, it goes fine.**
>> **
>>
>> ** **
>>
>> If I enable "ovs_use_veth", Metadata (and/or DHCP) stops working and I
>> and unable to start new Ubuntu Instances and login into them... Look:****
>>
>> ** **
>>
>> --****
>>
>> cloud-init start running: Tue, 22 Oct 2013 05:57:39 +0000. up 4.01 seconds
>> ****
>>
>> 2013-10-22 06:01:42,989 - util.py[WARNING]: '
>> http://169.254.169.254/2009-04-04/meta-data/instance-id' failed
>> [3/120s]: url error [[Errno 113] No route to host]****
>>
>> 2013-10-22 06:01:45,988 - util.py[WARNING]: '
>> http://169.254.169.254/2009-04-04/meta-data/instance-id' failed
>> [6/120s]: url error [[Errno 113] No route to host]****
>>
>> --****
>>
>>  ** **
>>
>> ** **
>>
>> Do you see anything interesting in the neutron-metadata-agent log? Or it
>> looks like your instance doesn't have a route to the default gw? ****
>>
>>  ****
>>
>>  ** **
>>
>> Is this problem still around?!****
>>
>> ** **
>>
>> Should I stay away from GRE tunnels when with Havana + Ubuntu 12.04.3?***
>> *
>>
>> ** **
>>
>> Is it possible to re-enable Metadata when ovs_use_veth = true ?****
>>
>> ** **
>>
>> Thanks!****
>>
>> Thiago****
>>
>> ** **
>>
>> On 3 October 2013 06:27, James Page <james.page at ubuntu.com> wrote:****
>>
>>  -----BEGIN PGP SIGNED MESSAGE-----
>> Hash: SHA256****
>>
>> On 02/10/13 22:49, James Page wrote:
>> >> sudo ip netns exec qrouter-d3baf1b1-55ee-42cb-a3f6-9629288e3221
>> >>> traceroute -n 10.5.0.2 -p 44444 --mtu traceroute to 10.5.0.2
>> >>> (10.5.0.2), 30 hops max, 65000 byte packets 1  10.5.0.2  0.950
>> >>> ms F=1500  0.598 ms  0.566 ms
>> >>>
>> >>> The PMTU from the l3 gateway to the instance looks OK to me.
>> > I spent a bit more time debugging this; performance from within
>> > the router netns on the L3 gateway node looks good in both
>> > directions when accessing via the tenant network (10.5.0.2) over
>> > the qr-XXXXX interface, but when accessing through the external
>> > network from within the netns I see the same performance choke
>> > upstream into the tenant network.
>> >
>> > Which would indicate that my problem lies somewhere around the
>> > qg-XXXXX interface in the router netns - just trying to figure out
>> > exactly what - maybe iptables is doing something wonky?****
>>
>> OK - I found a fix but I'm not sure why this makes a difference;
>> neither my l3-agent or dhcp-agent configuration had 'ovs_use_veth =
>> True'; I switched this on, clearing everything down, rebooted and now
>> I seem symmetric good performance across all neutron routers.
>>
>> This would point to some sort of underlying bug when ovs_use_veth = False.
>> ****
>>
>>
>>
>> - --
>> James Page
>> Ubuntu and Debian Developer
>> james.page at ubuntu.com
>> jamespage at debian.org
>> -----BEGIN PGP SIGNATURE-----
>> Version: GnuPG v1.4.14 (GNU/Linux)
>> Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/****
>>
>> iQIcBAEBCAAGBQJSTTh6AAoJEL/srsug59jDmpEP/jaB5/yn9+Xm12XrVu0Q3IV5
>> fLGOuBboUgykVVsfkWccI/oygNlBaXIcDuak/E4jxPcoRhLAdY1zpX8MQ8wSsGKd
>> CjSeuW8xxnXubdfzmsCKSs3FCIBhDkSYzyiJd/raLvCfflyy8Cl7KN2x22mGHJ6z
>> qZ9APcYfm9qCVbEssA3BHcUL+st1iqMJ0YhVZBk03+QEXaWu3FFbjpjwx3X1ZvV5
>> Vbac7enqy7Lr4DSAIJVldeVuRURfv3YE3iJZTIXjaoUCCVTQLm5OmP9TrwBNHLsA
>> 7W+LceQri+Vh0s4dHPKx5MiHsV3RCydcXkSQFYhx7390CXypMQ6WwXEY/a8Egssg
>> SuxXByHwEcQFa+9sCwPQ+RXCmC0O6kUi8EPmwadjI5Gc1LoKw5Wov/SEen86fDUW
>> P9pRXonseYyWN9I4MT4aG1ez8Dqq/SiZyWBHtcITxKI2smD92G9CwWGo4L9oGqJJ
>> UcHRwQaTHgzy3yETPO25hjax8ZWZGNccHBixMCZKegr9p2dhR+7qF8G7mRtRQLxL
>> 0fgOAExn/SX59ZT4RaYi9fI6Gng13RtSyI87CJC/50vfTmqoraUUK1aoSjIY4Dt+
>> DYEMMLp205uLEj2IyaNTzykR0yh3t6dvfpCCcRA/xPT9slfa0a7P8LafyiWa4/5c
>> jkJM4Y1BUV+2L5Rrf3sc
>> =4lO4****
>>
>> -----END PGP SIGNATURE-----
>>
>> _______________________________________________
>> Mailing list:
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>> Post to     : openstack at lists.openstack.org
>> Unsubscribe :
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack****
>>
>>  ** **
>>
>>
>> _______________________________________________
>> Mailing list:
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>> Post to     : openstack at lists.openstack.org
>> Unsubscribe :
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack****
>>
>>   ** **
>>
>>  ** **
>>
>> _______________________________________________
>> Mailing list:
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>> Post to     : openstack at lists.openstack.org
>> Unsubscribe :
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>>
>>
>
> _______________________________________________
> Mailing list:
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
> Post to     : openstack at lists.openstack.org
> Unsubscribe :
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack/attachments/20131024/1672c0b6/attachment.html>


More information about the Openstack mailing list