[Openstack] Directional network performance issues with Neutron + OpenvSwitch
Martinx - ジェームズ
thiagocmartinsc at gmail.com
Fri Oct 25 14:34:50 UTC 2013
Daniel,
Honestly, I think I have two problems, first one is related to "instances
trying to reach the Internet", that traffic that pass trough Network Node
(L3 + Namespace), which is vey, very slow. It is impossible to run "apt-get
update" from within a Instance, for example, takes an eternity to finish,
no MTU problems detected with tcpdump at the L3, it must be something else.
The second problem, is related to the communication between two instances
on different hypervisors. Which I just realized after doing more tests.
Do you think that those two problems are, in fact, the same (or related)?
Thanks!
Thiago
On 25 October 2013 10:51, Speichert,Daniel <djs428 at drexel.edu> wrote:
> Thiago,****
>
> It looks like you have a slightly different problem. I didn’t have any
> slowdown in the connection between instances.****
>
> ** **
>
> You might want to try this:
> https://ask.openstack.org/en/question/6140/quantum-neutron-gre-slow-performance/?answer=6320#post-id-6320
> ****
>
> ** **
>
> Regards,****
>
> Daniel****
>
> ** **
>
> *From:* Martinx - ジェームズ [mailto:thiagocmartinsc at gmail.com]
> *Sent:* Thursday, October 24, 2013 11:59 PM
> *To:* Speichert,Daniel
> *Cc:* Anne Gentle; openstack at lists.openstack.org
>
> *Subject:* Re: [Openstack] Directional network performance issues with
> Neutron + OpenvSwitch****
>
> ** **
>
> Hi Daniel,****
>
> ** **
>
> I followed that page, my Instances MTU is lowered by DHCP Agent but, same
> result: poor network performance (internal between Instances and when
> trying to reach the Internet).****
>
> ** **
>
> No matter if I use "dnsmasq_config_file=/etc/neutron/dnsmasq-neutron.conf
> + "dhcp-option-force=26,1400"" for my Neutron DHCP agent, or not (i.e. MTU
> = 1500), the result is almost the same.****
>
> ** **
>
> I'll try VXLAN (or just VLANs) this weekend to see if I can get better
> results...****
>
> ** **
>
> Thanks!****
>
> Thiago****
>
> ** **
>
> ** **
>
> ** **
>
> On 24 October 2013 17:38, Speichert,Daniel <djs428 at drexel.edu> wrote:****
>
> We managed to bring the upload speed back to maximum on the instances
> through the use of this guide:****
>
>
> http://docs.openstack.org/trunk/openstack-network/admin/content/openvswitch_plugin.html
> ****
>
> ****
>
> Basically, the MTU needs to be lowered for GRE tunnels. It can be done
> with DHCP as explained in the new trunk manual.****
>
> ****
>
> Regards,****
>
> Daniel****
>
> ****
>
> *From:* annegentle at justwriteclick.com [mailto:
> annegentle at justwriteclick.com] *On Behalf Of *Anne Gentle
> *Sent:* Thursday, October 24, 2013 12:08 PM
> *To:* Martinx - ジェームズ
> *Cc:* Speichert,Daniel; openstack at lists.openstack.org****
>
>
> *Subject:* Re: [Openstack] Directional network performance issues with
> Neutron + OpenvSwitch****
>
> ****
>
> ****
>
> ****
>
> On Thu, Oct 24, 2013 at 10:37 AM, Martinx - ジェームズ <
> thiagocmartinsc at gmail.com> wrote:****
>
> Precisely!****
>
> ****
>
> The doc currently says to disable Namespace when using GRE, never did this
> before, look:****
>
> ****
>
>
> http://docs.openstack.org/trunk/install-guide/install/apt/content/install-neutron.install-plugin.ovs.gre.html
> ****
>
> ****
>
> But on this very same doc, they say to enable it... Who knows?! =P****
>
> ****
>
>
> http://docs.openstack.org/trunk/install-guide/install/apt/content/section_networking-routers-with-private-networks.html
> ****
>
> ****
>
> I stick with Namespace enabled...****
>
> ****
>
> ****
>
> Just a reminder, /trunk/ links are works in progress, thanks for bringing
> the mismatch to our attention, and we already have a doc bug filed:****
>
> ****
>
> https://bugs.launchpad.net/openstack-manuals/+bug/1241056****
>
> ****
>
> Review this patch: https://review.openstack.org/#/c/53380/****
>
> ****
>
> Anne****
>
> ****
>
> ****
>
> ****
>
> Let me ask you something, when you enable ovs_use_veth, que Metadata and
> DHCP still works?!****
>
> ****
>
> Cheers!****
>
> Thiago****
>
> ****
>
> On 24 October 2013 12:22, Speichert,Daniel <djs428 at drexel.edu> wrote:****
>
> Hello everyone,****
>
> ****
>
> It seems we also ran into the same issue.****
>
> ****
>
> We are running Ubuntu Saucy with OpenStack Havana from Ubuntu Cloud
> archives (precise-updates).****
>
> ****
>
> The download speed to the VMs increased from 5 Mbps to maximum after
> enabling ovs_use_veth. Upload speed from the VMs is still terrible (max 1
> Mbps, usually 0.04 Mbps).****
>
> ****
>
> Here is the iperf between the instance and L3 agent (network node) inside
> namespace.****
>
> ****
>
> root at cloud:~# ip netns exec qrouter-a29e0200-d390-40d1-8cf7-7ac1cef5863a
> iperf -c 10.1.0.24 -r****
>
> ------------------------------------------------------------****
>
> Server listening on TCP port 5001****
>
> TCP window size: 85.3 KByte (default)****
>
> ------------------------------------------------------------****
>
> ------------------------------------------------------------****
>
> Client connecting to 10.1.0.24, TCP port 5001****
>
> TCP window size: 585 KByte (default)****
>
> ------------------------------------------------------------****
>
> [ 7] local 10.1.0.1 port 37520 connected with 10.1.0.24 port 5001****
>
> [ ID] Interval Transfer Bandwidth****
>
> [ 7] 0.0-10.0 sec 845 MBytes 708 Mbits/sec****
>
> [ 6] local 10.1.0.1 port 5001 connected with 10.1.0.24 port 53006****
>
> [ 6] 0.0-31.4 sec 256 KBytes 66.7 Kbits/sec****
>
> ****
>
> We are using Neutron OpenVSwitch with GRE and namespaces.****
>
>
> A side question: the documentation says to disable namespaces with GRE and
> enable them with VLANs. It was always working well for us on Grizzly with
> GRE and namespaces and we could never get it to work without namespaces. Is
> there any specific reason why the documentation is advising to disable it?
> ****
>
> ****
>
> Regards,****
>
> Daniel****
>
> ****
>
> *From:* Martinx - ジェームズ [mailto:thiagocmartinsc at gmail.com]
> *Sent:* Thursday, October 24, 2013 3:58 AM
> *To:* Aaron Rosen
> *Cc:* openstack at lists.openstack.org****
>
>
> *Subject:* Re: [Openstack] Directional network performance issues with
> Neutron + OpenvSwitch****
>
> ****
>
> Hi Aaron,****
>
> ****
>
> Thanks for answering! =)****
>
> ****
>
> Lets work...****
>
> ****
>
> ---****
>
> ****
>
> TEST #1 - iperf between Network Node and its Uplink router (Data Center's
> gateway "Internet") - OVS br-ex / eth2****
>
> ****
>
> # Tenant Namespace route table****
>
> ****
>
> root at net-node-1:~# ip netns exec
> qrouter-46cb8f7a-a3c5-4da7-ad69-4de63f7c34f1 ip route****
>
> default via 172.16.0.1 dev qg-50b615b7-c2 ****
>
> 172.16.0.0/20 dev qg-50b615b7-c2 proto kernel scope link src
> 172.16.0.2 ****
>
> 192.168.210.0/24 dev qr-a1376f61-05 proto kernel scope link src
> 192.168.210.1 ****
>
> ****
>
> # there is a "iperf -s" running at 172.16.0.1 "Internet", testing it****
>
> ****
>
> root at net-node-1:~# ip netns exec
> qrouter-46cb8f7a-a3c5-4da7-ad69-4de63f7c34f1 iperf -c 172.16.0.1****
>
> ------------------------------------------------------------****
>
> Client connecting to 172.16.0.1, TCP port 5001****
>
> TCP window size: 22.9 KByte (default)****
>
> ------------------------------------------------------------****
>
> [ 5] local 172.16.0.2 port 58342 connected with 172.16.0.1 port 5001****
>
> [ ID] Interval Transfer Bandwidth****
>
> [ 5] 0.0-10.0 sec 668 MBytes 559 Mbits/sec****
>
> ---****
>
> ****
>
> ---****
>
> ****
>
> TEST #2 - iperf on one instance to the Namespace of the L3 agent + uplink
> router****
>
> ****
>
> # iperf server running within Tenant's Namespace router****
>
> ****
>
> root at net-node-1:~# ip netns exec
> qrouter-46cb8f7a-a3c5-4da7-ad69-4de63f7c34f1 iperf -s****
>
> ****
>
> -****
>
> ****
>
> # from instance-1****
>
> ****
>
> ubuntu at instance-1:~$ ip route****
>
> default via 192.168.210.1 dev eth0 metric 100 ****
>
> 192.168.210.0/24 dev eth0 proto kernel scope link src 192.168.210.2 ***
> *
>
> ****
>
> # instance-1 performing tests against net-node-1 Namespace above****
>
> ****
>
> ubuntu at instance-1:~$ iperf -c 192.168.210.1****
>
> ------------------------------------------------------------****
>
> Client connecting to 192.168.210.1, TCP port 5001****
>
> TCP window size: 21.0 KByte (default)****
>
> ------------------------------------------------------------****
>
> [ 3] local 192.168.210.2 port 43739 connected with 192.168.210.1 port
> 5001****
>
> [ ID] Interval Transfer Bandwidth****
>
> [ 3] 0.0-10.0 sec 484 MBytes 406 Mbits/sec****
>
> ****
>
> # still on instance-1, now against "External IP" of its own Namespace /
> Router****
>
> ****
>
> ubuntu at instance-1:~$ iperf -c 172.16.0.2****
>
> ------------------------------------------------------------****
>
> Client connecting to 172.16.0.2, TCP port 5001****
>
> TCP window size: 21.0 KByte (default)****
>
> ------------------------------------------------------------****
>
> [ 3] local 192.168.210.2 port 34703 connected with 172.16.0.2 port 5001**
> **
>
> [ ID] Interval Transfer Bandwidth****
>
> [ 3] 0.0-10.0 sec 520 MBytes 436 Mbits/sec****
>
> ****
>
> # still on instance-1, now against the Data Center UpLink Router****
>
> ****
>
> ubuntu at instance-1:~$ iperf -c 172.16.0.1****
>
> ------------------------------------------------------------****
>
> Client connecting to 172.16.0.1, TCP port 5001****
>
> TCP window size: 21.0 KByte (default)****
>
> ------------------------------------------------------------****
>
> [ 3] local 192.168.210.4 port 38401 connected with 172.16.0.1 port 5001**
> **
>
> [ ID] Interval Transfer Bandwidth****
>
> [ 3] 0.0-10.0 sec * 324 MBytes 271 Mbits/sec*****
>
> ---****
>
> ****
>
> This latest test shows only 271 Mbits/s! I think it should be at least,
> 400~430 MBits/s... Right?!****
>
> ****
>
> ---****
>
> ****
>
> TEST #3 - Two instances on the same hypervisor****
>
> ****
>
> # iperf server****
>
> ****
>
> ubuntu at instance-2:~$ ip route****
>
> default via 192.168.210.1 dev eth0 metric 100 ****
>
> 192.168.210.0/24 dev eth0 proto kernel scope link src 192.168.210.4 ***
> *
>
> ****
>
> ubuntu at instance-2:~$ iperf -s****
>
> ------------------------------------------------------------****
>
> Server listening on TCP port 5001****
>
> TCP window size: 85.3 KByte (default)****
>
> ------------------------------------------------------------****
>
> [ 4] local 192.168.210.4 port 5001 connected with 192.168.210.2 port
> 45800****
>
> [ ID] Interval Transfer Bandwidth****
>
> [ 4] 0.0-10.0 sec 4.61 GBytes 3.96 Gbits/sec****
>
> ****
>
> # iperf client****
>
> ****
>
> ubuntu at instance-1:~$ iperf -c 192.168.210.4****
>
> ------------------------------------------------------------****
>
> Client connecting to 192.168.210.4, TCP port 5001****
>
> TCP window size: 21.0 KByte (default)****
>
> ------------------------------------------------------------****
>
> [ 3] local 192.168.210.2 port 45800 connected with 192.168.210.4 port
> 5001****
>
> [ ID] Interval Transfer Bandwidth****
>
> [ 3] 0.0-10.0 sec 4.61 GBytes 3.96 Gbits/sec****
>
> ---****
>
> ****
>
> ---****
>
> ****
>
> TEST #4 - Two instances on different hypervisors - over GRE****
>
> ****
>
> root at instance-2:~# iperf -s****
>
> ------------------------------------------------------------****
>
> Server listening on TCP port 5001****
>
> TCP window size: 85.3 KByte (default)****
>
> ------------------------------------------------------------****
>
> [ 4] local 192.168.210.4 port 5001 connected with 192.168.210.2 port
> 34640****
>
> [ ID] Interval Transfer Bandwidth****
>
> [ 4] 0.0-10.0 sec 237 MBytes 198 Mbits/sec****
>
> ****
>
> ****
>
> root at instance-1:~# iperf -c 192.168.210.4****
>
> ------------------------------------------------------------****
>
> Client connecting to 192.168.210.4, TCP port 5001****
>
> TCP window size: 21.0 KByte (default)****
>
> ------------------------------------------------------------****
>
> [ 3] local 192.168.210.2 port 34640 connected with 192.168.210.4 port
> 5001****
>
> [ ID] Interval Transfer Bandwidth****
>
> [ 3] 0.0-10.0 sec 237 MBytes 198 Mbits/sec****
>
> ---****
>
> ****
>
> I just realized how slow is my intra-cloud (intra-VM) communication...
> :-/****
>
> ****
>
> ---****
>
> ****
>
> TEST #5 - Two hypervisors - "GRE TUNNEL LAN" - OVS local_ip / remote_ip***
> *
>
> ****
>
> # Same path of "TEST #4" but, testing the physical GRE path (where GRE
> traffic flows)****
>
> ****
>
> root at hypervisor-2:~$ iperf -s****
>
> ------------------------------------------------------------****
>
> Server listening on TCP port 5001****
>
> TCP window size: 85.3 KByte (default)****
>
> ------------------------------------------------------------****
>
> n[ 4] local 10.20.2.57 port 5001 connected with 10.20.2.53 port 51694****
>
> [ ID] Interval Transfer Bandwidth****
>
> [ 4] 0.0-10.0 sec 1.09 GBytes 939 Mbits/sec****
>
> ****
>
> root at hypervisor-1:~# iperf -c 10.20.2.57****
>
> ------------------------------------------------------------****
>
> Client connecting to 10.20.2.57, TCP port 5001****
>
> TCP window size: 22.9 KByte (default)****
>
> ------------------------------------------------------------****
>
> [ 3] local 10.20.2.53 port 51694 connected with 10.20.2.57 port 5001****
>
> [ ID] Interval Transfer Bandwidth****
>
> [ 3] 0.0-10.0 sec 1.09 GBytes 939 Mbits/sec****
>
> ---****
>
> ****
>
> About Test #5, I don't know why the GRE traffic (Test #4) doesn't reach
> 1Gbit/sec (only ~200Mbit/s ?), since its physical path is much faster
> (GIGALan). Plus, Test #3 shows a pretty fast speed when traffic flows only
> within a hypervisor (3.96Gbit/sec).****
>
> ****
>
> Tomorrow, I'll do this tests with netperf.****
>
> ****
>
> NOTE: I'm using Open vSwitch 1.11.0, compiled for Ubuntu 12.04.3, via
> "dpkg-buildpackage" and installed via "Debian / Ubuntu way". If I downgrade
> to 1.10.2 from Havana Cloud Archive, same results... I can downgrade it, if
> you guys tell me to do so.****
>
> ****
>
> BTW, I'll install another "Region", based on Havana on Ubuntu 13.10, with
> exactly the same configurations from my current Havana + Ubuntu 12.04.3, on
> top of the same hardware, to see if the problem still persist.****
>
> ****
>
> Regards,****
>
> Thiago****
>
> ****
>
> On 23 October 2013 22:40, Aaron Rosen <arosen at nicira.com> wrote:****
>
> ****
>
> ****
>
> On Mon, Oct 21, 2013 at 11:52 PM, Martinx - ジェームズ <
> thiagocmartinsc at gmail.com> wrote:****
>
> James,****
>
> ****
>
> I think I'm hitting this problem.****
>
> ****
>
> I'm using "Per-Tenant Routers with Private Networks", GRE tunnels and
> L3+DHCP Network Node.****
>
> ****
>
> The connectivity from behind my Instances is very slow. It takes an
> eternity to finish "apt-get update".****
>
> ****
>
> ****
>
> I'm curious if you can do the following tests to help pinpoint the bottle
> neck: ****
>
> ****
>
> Run iperf or netperf between:****
>
> two instances on the same hypervisor - this will determine if it's a
> virtualization driver issue if the performance is bad. ****
>
> two instances on different hypervisors.****
>
> one instance to the namespace of the l3 agent. ****
>
> ****
>
> ****
>
> ****
>
> ****
>
> ****
>
> ****
>
> If I run "apt-get update" from within tenant's Namespace, it goes fine.***
> *
>
> ****
>
> If I enable "ovs_use_veth", Metadata (and/or DHCP) stops working and I and
> unable to start new Ubuntu Instances and login into them... Look:****
>
> ****
>
> --****
>
> cloud-init start running: Tue, 22 Oct 2013 05:57:39 +0000. up 4.01 seconds
> ****
>
> 2013-10-22 06:01:42,989 - util.py[WARNING]: '
> http://169.254.169.254/2009-04-04/meta-data/instance-id' failed [3/120s]:
> url error [[Errno 113] No route to host]****
>
> 2013-10-22 06:01:45,988 - util.py[WARNING]: '
> http://169.254.169.254/2009-04-04/meta-data/instance-id' failed [6/120s]:
> url error [[Errno 113] No route to host]****
>
> --****
>
> ****
>
> ****
>
> Do you see anything interesting in the neutron-metadata-agent log? Or it
> looks like your instance doesn't have a route to the default gw? ****
>
> ****
>
> ****
>
> Is this problem still around?!****
>
> ****
>
> Should I stay away from GRE tunnels when with Havana + Ubuntu 12.04.3?****
>
> ****
>
> Is it possible to re-enable Metadata when ovs_use_veth = true ?****
>
> ****
>
> Thanks!****
>
> Thiago****
>
> ****
>
> On 3 October 2013 06:27, James Page <james.page at ubuntu.com> wrote:****
>
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA256****
>
> On 02/10/13 22:49, James Page wrote:
> >> sudo ip netns exec qrouter-d3baf1b1-55ee-42cb-a3f6-9629288e3221
> >>> traceroute -n 10.5.0.2 -p 44444 --mtu traceroute to 10.5.0.2
> >>> (10.5.0.2), 30 hops max, 65000 byte packets 1 10.5.0.2 0.950
> >>> ms F=1500 0.598 ms 0.566 ms
> >>>
> >>> The PMTU from the l3 gateway to the instance looks OK to me.
> > I spent a bit more time debugging this; performance from within
> > the router netns on the L3 gateway node looks good in both
> > directions when accessing via the tenant network (10.5.0.2) over
> > the qr-XXXXX interface, but when accessing through the external
> > network from within the netns I see the same performance choke
> > upstream into the tenant network.
> >
> > Which would indicate that my problem lies somewhere around the
> > qg-XXXXX interface in the router netns - just trying to figure out
> > exactly what - maybe iptables is doing something wonky?****
>
> OK - I found a fix but I'm not sure why this makes a difference;
> neither my l3-agent or dhcp-agent configuration had 'ovs_use_veth =
> True'; I switched this on, clearing everything down, rebooted and now
> I seem symmetric good performance across all neutron routers.
>
> This would point to some sort of underlying bug when ovs_use_veth = False.
> ****
>
>
>
> - --
> James Page
> Ubuntu and Debian Developer
> james.page at ubuntu.com
> jamespage at debian.org
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.14 (GNU/Linux)
> Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/****
>
> iQIcBAEBCAAGBQJSTTh6AAoJEL/srsug59jDmpEP/jaB5/yn9+Xm12XrVu0Q3IV5
> fLGOuBboUgykVVsfkWccI/oygNlBaXIcDuak/E4jxPcoRhLAdY1zpX8MQ8wSsGKd
> CjSeuW8xxnXubdfzmsCKSs3FCIBhDkSYzyiJd/raLvCfflyy8Cl7KN2x22mGHJ6z
> qZ9APcYfm9qCVbEssA3BHcUL+st1iqMJ0YhVZBk03+QEXaWu3FFbjpjwx3X1ZvV5
> Vbac7enqy7Lr4DSAIJVldeVuRURfv3YE3iJZTIXjaoUCCVTQLm5OmP9TrwBNHLsA
> 7W+LceQri+Vh0s4dHPKx5MiHsV3RCydcXkSQFYhx7390CXypMQ6WwXEY/a8Egssg
> SuxXByHwEcQFa+9sCwPQ+RXCmC0O6kUi8EPmwadjI5Gc1LoKw5Wov/SEen86fDUW
> P9pRXonseYyWN9I4MT4aG1ez8Dqq/SiZyWBHtcITxKI2smD92G9CwWGo4L9oGqJJ
> UcHRwQaTHgzy3yETPO25hjax8ZWZGNccHBixMCZKegr9p2dhR+7qF8G7mRtRQLxL
> 0fgOAExn/SX59ZT4RaYi9fI6Gng13RtSyI87CJC/50vfTmqoraUUK1aoSjIY4Dt+
> DYEMMLp205uLEj2IyaNTzykR0yh3t6dvfpCCcRA/xPT9slfa0a7P8LafyiWa4/5c
> jkJM4Y1BUV+2L5Rrf3sc
> =4lO4****
>
> -----END PGP SIGNATURE-----
>
> _______________________________________________
> Mailing list:
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
> Post to : openstack at lists.openstack.org
> Unsubscribe :
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack****
>
> ****
>
>
> _______________________________________________
> Mailing list:
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
> Post to : openstack at lists.openstack.org
> Unsubscribe :
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack****
>
> ****
>
> ****
>
>
> _______________________________________________
> Mailing list:
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
> Post to : openstack at lists.openstack.org
> Unsubscribe :
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack****
>
> ****
>
>
> _______________________________________________
> Mailing list:
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
> Post to : openstack at lists.openstack.org
> Unsubscribe :
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack****
>
> ****
>
> ** **
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack/attachments/20131025/3bc2ea00/attachment.html>
More information about the Openstack
mailing list