[Openstack] Directional network performance issues with Neutron + OpenvSwitch

Martinx - ジェームズ thiagocmartinsc at gmail.com
Fri Oct 25 03:58:57 UTC 2013


Hi Daniel,

I followed that page, my Instances MTU is lowered by DHCP Agent but, same
result: poor network performance (internal between Instances and when
trying to reach the Internet).

No matter if I use "dnsmasq_config_file=/etc/neutron/dnsmasq-neutron.conf +
"dhcp-option-force=26,1400"" for my Neutron DHCP agent, or not (i.e. MTU =
1500), the result is almost the same.

I'll try VXLAN (or just VLANs) this weekend to see if I can get better
results...

Thanks!
Thiago




On 24 October 2013 17:38, Speichert,Daniel <djs428 at drexel.edu> wrote:

>  We managed to bring the upload speed back to maximum on the instances
> through the use of this guide:****
>
>
> http://docs.openstack.org/trunk/openstack-network/admin/content/openvswitch_plugin.html
> ****
>
> ** **
>
> Basically, the MTU needs to be lowered for GRE tunnels. It can be done
> with DHCP as explained in the new trunk manual.****
>
> ** **
>
> Regards,****
>
> Daniel****
>
> ** **
>
> *From:* annegentle at justwriteclick.com [mailto:
> annegentle at justwriteclick.com] *On Behalf Of *Anne Gentle
> *Sent:* Thursday, October 24, 2013 12:08 PM
> *To:* Martinx - ジェームズ
> *Cc:* Speichert,Daniel; openstack at lists.openstack.org
>
> *Subject:* Re: [Openstack] Directional network performance issues with
> Neutron + OpenvSwitch****
>
> ** **
>
> ** **
>
> ** **
>
> On Thu, Oct 24, 2013 at 10:37 AM, Martinx - ジェームズ <
> thiagocmartinsc at gmail.com> wrote:****
>
>  Precisely!****
>
> ** **
>
> The doc currently says to disable Namespace when using GRE, never did this
> before, look:****
>
> ** **
>
>
> http://docs.openstack.org/trunk/install-guide/install/apt/content/install-neutron.install-plugin.ovs.gre.html
> ****
>
> ** **
>
> But on this very same doc, they say to enable it... Who knows?!   =P****
>
> ** **
>
>
> http://docs.openstack.org/trunk/install-guide/install/apt/content/section_networking-routers-with-private-networks.html
> ****
>
> ** **
>
> I stick with Namespace enabled...****
>
> ** **
>
>  ** **
>
> Just a reminder, /trunk/ links are works in progress, thanks for bringing
> the mismatch to our attention, and we already have a doc bug filed:****
>
> ** **
>
> https://bugs.launchpad.net/openstack-manuals/+bug/1241056****
>
> ** **
>
> Review this patch: https://review.openstack.org/#/c/53380/****
>
> ** **
>
> Anne****
>
> ** **
>
> ** **
>
>  ****
>
>  Let me ask you something, when you enable ovs_use_veth, que Metadata and
> DHCP still works?!****
>
> ** **
>
> Cheers!****
>
> Thiago****
>
> ** **
>
> On 24 October 2013 12:22, Speichert,Daniel <djs428 at drexel.edu> wrote:****
>
>  Hello everyone,****
>
>  ****
>
> It seems we also ran into the same issue.****
>
>  ****
>
> We are running Ubuntu Saucy with OpenStack Havana from Ubuntu Cloud
> archives (precise-updates).****
>
>  ****
>
> The download speed to the VMs increased from 5 Mbps to maximum after
> enabling ovs_use_veth. Upload speed from the VMs is still terrible (max 1
> Mbps, usually 0.04 Mbps).****
>
>  ****
>
> Here is the iperf between the instance and L3 agent (network node) inside
> namespace.****
>
>  ****
>
> root at cloud:~# ip netns exec qrouter-a29e0200-d390-40d1-8cf7-7ac1cef5863a
> iperf -c 10.1.0.24 -r****
>
> ------------------------------------------------------------****
>
> Server listening on TCP port 5001****
>
> TCP window size: 85.3 KByte (default)****
>
> ------------------------------------------------------------****
>
> ------------------------------------------------------------****
>
> Client connecting to 10.1.0.24, TCP port 5001****
>
> TCP window size:  585 KByte (default)****
>
> ------------------------------------------------------------****
>
> [  7] local 10.1.0.1 port 37520 connected with 10.1.0.24 port 5001****
>
> [ ID] Interval       Transfer     Bandwidth****
>
> [  7]  0.0-10.0 sec   845 MBytes   708 Mbits/sec****
>
> [  6] local 10.1.0.1 port 5001 connected with 10.1.0.24 port 53006****
>
> [  6]  0.0-31.4 sec   256 KBytes  66.7 Kbits/sec****
>
>  ****
>
> We are using Neutron OpenVSwitch with GRE and namespaces.****
>
>
> A side question: the documentation says to disable namespaces with GRE and
> enable them with VLANs. It was always working well for us on Grizzly with
> GRE and namespaces and we could never get it to work without namespaces. Is
> there any specific reason why the documentation is advising to disable it?
> ****
>
>  ****
>
> Regards,****
>
> Daniel****
>
>  ****
>
> *From:* Martinx - ジェームズ [mailto:thiagocmartinsc at gmail.com]
> *Sent:* Thursday, October 24, 2013 3:58 AM
> *To:* Aaron Rosen
> *Cc:* openstack at lists.openstack.org****
>
>
> *Subject:* Re: [Openstack] Directional network performance issues with
> Neutron + OpenvSwitch****
>
>  ****
>
> Hi Aaron,****
>
>  ****
>
> Thanks for answering!     =)****
>
>  ****
>
> Lets work...****
>
>  ****
>
> ---****
>
>  ****
>
> TEST #1 - iperf between Network Node and its Uplink router (Data Center's
> gateway "Internet") - OVS br-ex / eth2****
>
>  ****
>
> # Tenant Namespace route table****
>
>  ****
>
> root at net-node-1:~# ip netns exec
> qrouter-46cb8f7a-a3c5-4da7-ad69-4de63f7c34f1 ip route****
>
> default via 172.16.0.1 dev qg-50b615b7-c2 ****
>
> 172.16.0.0/20 dev qg-50b615b7-c2  proto kernel  scope link  src
> 172.16.0.2 ****
>
> 192.168.210.0/24 dev qr-a1376f61-05  proto kernel  scope link  src
> 192.168.210.1 ****
>
>  ****
>
> # there is a "iperf -s" running at 172.16.0.1 "Internet", testing it****
>
>  ****
>
> root at net-node-1:~# ip netns exec
> qrouter-46cb8f7a-a3c5-4da7-ad69-4de63f7c34f1 iperf -c 172.16.0.1****
>
> ------------------------------------------------------------****
>
> Client connecting to 172.16.0.1, TCP port 5001****
>
> TCP window size: 22.9 KByte (default)****
>
> ------------------------------------------------------------****
>
> [  5] local 172.16.0.2 port 58342 connected with 172.16.0.1 port 5001****
>
> [ ID] Interval       Transfer     Bandwidth****
>
> [  5]  0.0-10.0 sec   668 MBytes   559 Mbits/sec****
>
> ---****
>
>  ****
>
> ---****
>
>  ****
>
> TEST #2 - iperf on one instance to the Namespace of the L3 agent + uplink
> router****
>
>  ****
>
> # iperf server running within Tenant's Namespace router****
>
>  ****
>
> root at net-node-1:~# ip netns exec
> qrouter-46cb8f7a-a3c5-4da7-ad69-4de63f7c34f1 iperf -s****
>
>  ****
>
> -****
>
>  ****
>
> # from instance-1****
>
>  ****
>
> ubuntu at instance-1:~$ ip route****
>
> default via 192.168.210.1 dev eth0  metric 100  ****
>
> 192.168.210.0/24 dev eth0  proto kernel  scope link  src 192.168.210.2 ***
> *
>
>  ****
>
> # instance-1 performing tests against net-node-1 Namespace above****
>
>  ****
>
> ubuntu at instance-1:~$ iperf -c 192.168.210.1****
>
> ------------------------------------------------------------****
>
> Client connecting to 192.168.210.1, TCP port 5001****
>
> TCP window size: 21.0 KByte (default)****
>
> ------------------------------------------------------------****
>
> [  3] local 192.168.210.2 port 43739 connected with 192.168.210.1 port
> 5001****
>
> [ ID] Interval       Transfer     Bandwidth****
>
> [  3]  0.0-10.0 sec   484 MBytes   406 Mbits/sec****
>
>  ****
>
> # still on instance-1, now against "External IP" of its own Namespace /
> Router****
>
>  ****
>
> ubuntu at instance-1:~$ iperf -c 172.16.0.2****
>
> ------------------------------------------------------------****
>
> Client connecting to 172.16.0.2, TCP port 5001****
>
> TCP window size: 21.0 KByte (default)****
>
> ------------------------------------------------------------****
>
> [  3] local 192.168.210.2 port 34703 connected with 172.16.0.2 port 5001**
> **
>
> [ ID] Interval       Transfer     Bandwidth****
>
> [  3]  0.0-10.0 sec   520 MBytes   436 Mbits/sec****
>
>  ****
>
> # still on instance-1, now against the Data Center UpLink Router****
>
>  ****
>
> ubuntu at instance-1:~$ iperf -c 172.16.0.1****
>
> ------------------------------------------------------------****
>
> Client connecting to 172.16.0.1, TCP port 5001****
>
> TCP window size: 21.0 KByte (default)****
>
> ------------------------------------------------------------****
>
> [  3] local 192.168.210.4 port 38401 connected with 172.16.0.1 port 5001**
> **
>
> [ ID] Interval       Transfer     Bandwidth****
>
> [  3]  0.0-10.0 sec  * 324 MBytes   271 Mbits/sec*****
>
> ---****
>
>  ****
>
> This latest test shows only 271 Mbits/s! I think it should be at least,
> 400~430 MBits/s... Right?!****
>
>  ****
>
> ---****
>
>  ****
>
> TEST #3 - Two instances on the same hypervisor****
>
>  ****
>
> # iperf server****
>
>  ****
>
> ubuntu at instance-2:~$ ip route****
>
> default via 192.168.210.1 dev eth0  metric 100 ****
>
> 192.168.210.0/24 dev eth0  proto kernel  scope link  src 192.168.210.4 ***
> *
>
>  ****
>
> ubuntu at instance-2:~$ iperf -s****
>
> ------------------------------------------------------------****
>
> Server listening on TCP port 5001****
>
> TCP window size: 85.3 KByte (default)****
>
> ------------------------------------------------------------****
>
> [  4] local 192.168.210.4 port 5001 connected with 192.168.210.2 port
> 45800****
>
> [ ID] Interval       Transfer     Bandwidth****
>
> [  4]  0.0-10.0 sec  4.61 GBytes  3.96 Gbits/sec****
>
>  ****
>
> # iperf client****
>
>  ****
>
> ubuntu at instance-1:~$ iperf -c 192.168.210.4****
>
> ------------------------------------------------------------****
>
> Client connecting to 192.168.210.4, TCP port 5001****
>
> TCP window size: 21.0 KByte (default)****
>
> ------------------------------------------------------------****
>
> [  3] local 192.168.210.2 port 45800 connected with 192.168.210.4 port
> 5001****
>
> [ ID] Interval       Transfer     Bandwidth****
>
> [  3]  0.0-10.0 sec  4.61 GBytes  3.96 Gbits/sec****
>
> ---****
>
>  ****
>
> ---****
>
>  ****
>
> TEST #4 - Two instances on different hypervisors - over GRE****
>
>  ****
>
> root at instance-2:~# iperf -s****
>
> ------------------------------------------------------------****
>
> Server listening on TCP port 5001****
>
> TCP window size: 85.3 KByte (default)****
>
> ------------------------------------------------------------****
>
> [  4] local 192.168.210.4 port 5001 connected with 192.168.210.2 port
> 34640****
>
> [ ID] Interval       Transfer     Bandwidth****
>
> [  4]  0.0-10.0 sec   237 MBytes   198 Mbits/sec****
>
>  ****
>
>  ****
>
> root at instance-1:~# iperf -c 192.168.210.4****
>
> ------------------------------------------------------------****
>
> Client connecting to 192.168.210.4, TCP port 5001****
>
> TCP window size: 21.0 KByte (default)****
>
> ------------------------------------------------------------****
>
> [  3] local 192.168.210.2 port 34640 connected with 192.168.210.4 port
> 5001****
>
> [ ID] Interval       Transfer     Bandwidth****
>
> [  3]  0.0-10.0 sec   237 MBytes   198 Mbits/sec****
>
> ---****
>
>  ****
>
> I just realized how slow is my intra-cloud (intra-VM) communication...
> :-/****
>
>  ****
>
> ---****
>
>  ****
>
> TEST #5 - Two hypervisors - "GRE TUNNEL LAN" - OVS local_ip / remote_ip***
> *
>
>  ****
>
> # Same path of "TEST #4" but, testing the physical GRE path (where GRE
> traffic flows)****
>
>  ****
>
> root at hypervisor-2:~$ iperf -s****
>
> ------------------------------------------------------------****
>
> Server listening on TCP port 5001****
>
> TCP window size: 85.3 KByte (default)****
>
> ------------------------------------------------------------****
>
> n[  4] local 10.20.2.57 port 5001 connected with 10.20.2.53 port 51694****
>
> [ ID] Interval       Transfer     Bandwidth****
>
> [  4]  0.0-10.0 sec  1.09 GBytes   939 Mbits/sec****
>
>  ****
>
> root at hypervisor-1:~# iperf -c 10.20.2.57****
>
> ------------------------------------------------------------****
>
> Client connecting to 10.20.2.57, TCP port 5001****
>
> TCP window size: 22.9 KByte (default)****
>
> ------------------------------------------------------------****
>
> [  3] local 10.20.2.53 port 51694 connected with 10.20.2.57 port 5001****
>
> [ ID] Interval       Transfer     Bandwidth****
>
> [  3]  0.0-10.0 sec  1.09 GBytes   939 Mbits/sec****
>
> ---****
>
>  ****
>
> About Test #5, I don't know why the GRE traffic (Test #4) doesn't reach
> 1Gbit/sec (only ~200Mbit/s ?), since its physical path is much faster
> (GIGALan). Plus, Test #3 shows a pretty fast speed when traffic flows only
> within a hypervisor (3.96Gbit/sec).****
>
>  ****
>
> Tomorrow, I'll do this tests with netperf.****
>
>  ****
>
> NOTE: I'm using Open vSwitch 1.11.0, compiled for Ubuntu 12.04.3, via
> "dpkg-buildpackage" and installed via "Debian / Ubuntu way". If I downgrade
> to 1.10.2 from Havana Cloud Archive, same results... I can downgrade it, if
> you guys tell me to do so.****
>
>  ****
>
> BTW, I'll install another "Region", based on Havana on Ubuntu 13.10, with
> exactly the same configurations from my current Havana + Ubuntu 12.04.3, on
> top of the same hardware, to see if the problem still persist.****
>
>  ****
>
> Regards,****
>
> Thiago****
>
>  ****
>
> On 23 October 2013 22:40, Aaron Rosen <arosen at nicira.com> wrote:****
>
>   ****
>
>  ****
>
> On Mon, Oct 21, 2013 at 11:52 PM, Martinx - ジェームズ <
> thiagocmartinsc at gmail.com> wrote:****
>
>  James,****
>
>  ****
>
> I think I'm hitting this problem.****
>
>  ****
>
> I'm using "Per-Tenant Routers with Private Networks", GRE tunnels and
> L3+DHCP Network Node.****
>
>  ****
>
> The connectivity from behind my Instances is very slow. It takes an
> eternity to finish "apt-get update".****
>
>   ****
>
>  ****
>
> I'm curious if you can do the following tests to help pinpoint the bottle
> neck: ****
>
>  ****
>
> Run iperf or netperf between:****
>
> two instances on the same hypervisor - this will determine if it's a
> virtualization driver issue if the performance is bad. ****
>
> two instances on different hypervisors.****
>
> one instance to the namespace of the l3 agent. ****
>
>  ****
>
>  ****
>
>  ****
>
>  ****
>
>  ****
>
>   ****
>
> If I run "apt-get update" from within tenant's Namespace, it goes fine.***
> *
>
>  ****
>
> If I enable "ovs_use_veth", Metadata (and/or DHCP) stops working and I and
> unable to start new Ubuntu Instances and login into them... Look:****
>
>  ****
>
> --****
>
> cloud-init start running: Tue, 22 Oct 2013 05:57:39 +0000. up 4.01 seconds
> ****
>
> 2013-10-22 06:01:42,989 - util.py[WARNING]: '
> http://169.254.169.254/2009-04-04/meta-data/instance-id' failed [3/120s]:
> url error [[Errno 113] No route to host]****
>
> 2013-10-22 06:01:45,988 - util.py[WARNING]: '
> http://169.254.169.254/2009-04-04/meta-data/instance-id' failed [6/120s]:
> url error [[Errno 113] No route to host]****
>
> --****
>
>   ****
>
>  ****
>
> Do you see anything interesting in the neutron-metadata-agent log? Or it
> looks like your instance doesn't have a route to the default gw? ****
>
>  ****
>
>   ****
>
> Is this problem still around?!****
>
>  ****
>
> Should I stay away from GRE tunnels when with Havana + Ubuntu 12.04.3?****
>
>  ****
>
> Is it possible to re-enable Metadata when ovs_use_veth = true ?****
>
>  ****
>
> Thanks!****
>
> Thiago****
>
>  ****
>
> On 3 October 2013 06:27, James Page <james.page at ubuntu.com> wrote:****
>
>  -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA256****
>
> On 02/10/13 22:49, James Page wrote:
> >> sudo ip netns exec qrouter-d3baf1b1-55ee-42cb-a3f6-9629288e3221
> >>> traceroute -n 10.5.0.2 -p 44444 --mtu traceroute to 10.5.0.2
> >>> (10.5.0.2), 30 hops max, 65000 byte packets 1  10.5.0.2  0.950
> >>> ms F=1500  0.598 ms  0.566 ms
> >>>
> >>> The PMTU from the l3 gateway to the instance looks OK to me.
> > I spent a bit more time debugging this; performance from within
> > the router netns on the L3 gateway node looks good in both
> > directions when accessing via the tenant network (10.5.0.2) over
> > the qr-XXXXX interface, but when accessing through the external
> > network from within the netns I see the same performance choke
> > upstream into the tenant network.
> >
> > Which would indicate that my problem lies somewhere around the
> > qg-XXXXX interface in the router netns - just trying to figure out
> > exactly what - maybe iptables is doing something wonky?****
>
> OK - I found a fix but I'm not sure why this makes a difference;
> neither my l3-agent or dhcp-agent configuration had 'ovs_use_veth =
> True'; I switched this on, clearing everything down, rebooted and now
> I seem symmetric good performance across all neutron routers.
>
> This would point to some sort of underlying bug when ovs_use_veth = False.
> ****
>
>
>
> - --
> James Page
> Ubuntu and Debian Developer
> james.page at ubuntu.com
> jamespage at debian.org
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.14 (GNU/Linux)
> Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/****
>
> iQIcBAEBCAAGBQJSTTh6AAoJEL/srsug59jDmpEP/jaB5/yn9+Xm12XrVu0Q3IV5
> fLGOuBboUgykVVsfkWccI/oygNlBaXIcDuak/E4jxPcoRhLAdY1zpX8MQ8wSsGKd
> CjSeuW8xxnXubdfzmsCKSs3FCIBhDkSYzyiJd/raLvCfflyy8Cl7KN2x22mGHJ6z
> qZ9APcYfm9qCVbEssA3BHcUL+st1iqMJ0YhVZBk03+QEXaWu3FFbjpjwx3X1ZvV5
> Vbac7enqy7Lr4DSAIJVldeVuRURfv3YE3iJZTIXjaoUCCVTQLm5OmP9TrwBNHLsA
> 7W+LceQri+Vh0s4dHPKx5MiHsV3RCydcXkSQFYhx7390CXypMQ6WwXEY/a8Egssg
> SuxXByHwEcQFa+9sCwPQ+RXCmC0O6kUi8EPmwadjI5Gc1LoKw5Wov/SEen86fDUW
> P9pRXonseYyWN9I4MT4aG1ez8Dqq/SiZyWBHtcITxKI2smD92G9CwWGo4L9oGqJJ
> UcHRwQaTHgzy3yETPO25hjax8ZWZGNccHBixMCZKegr9p2dhR+7qF8G7mRtRQLxL
> 0fgOAExn/SX59ZT4RaYi9fI6Gng13RtSyI87CJC/50vfTmqoraUUK1aoSjIY4Dt+
> DYEMMLp205uLEj2IyaNTzykR0yh3t6dvfpCCcRA/xPT9slfa0a7P8LafyiWa4/5c
> jkJM4Y1BUV+2L5Rrf3sc
> =4lO4****
>
> -----END PGP SIGNATURE-----
>
> _______________________________________________
> Mailing list:
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
> Post to     : openstack at lists.openstack.org
> Unsubscribe :
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack****
>
>   ****
>
>
> _______________________________________________
> Mailing list:
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
> Post to     : openstack at lists.openstack.org
> Unsubscribe :
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack****
>
>    ****
>
>   ****
>
>
> _______________________________________________
> Mailing list:
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
> Post to     : openstack at lists.openstack.org
> Unsubscribe :
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack****
>
>  ** **
>
>
> _______________________________________________
> Mailing list:
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
> Post to     : openstack at lists.openstack.org
> Unsubscribe :
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack****
>
>  ** **
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack/attachments/20131025/ec2d0240/attachment.html>


More information about the Openstack mailing list