[Openstack] Directional network performance issues with Neutron + OpenvSwitch

Robert Collins robertc at robertcollins.net
Thu Oct 24 20:29:28 UTC 2013


Ok so that says that PMTUd is failing, probably due to a
bug/limitation in openvswitch. Can we please make sure a bug is filed
- both on Neutron and on the upstream component as soon as someone
tracks it down : Manual MTU lowering is only needed when a network
component is failing to report failed delivery of DF packets
correctly.

-Rob

On 25 October 2013 08:38, Speichert,Daniel <djs428 at drexel.edu> wrote:
> We managed to bring the upload speed back to maximum on the instances
> through the use of this guide:
>
> http://docs.openstack.org/trunk/openstack-network/admin/content/openvswitch_plugin.html
>
>
>
> Basically, the MTU needs to be lowered for GRE tunnels. It can be done with
> DHCP as explained in the new trunk manual.
>
>
>
> Regards,
>
> Daniel
>
>
>
> From: annegentle at justwriteclick.com [mailto:annegentle at justwriteclick.com]
> On Behalf Of Anne Gentle
> Sent: Thursday, October 24, 2013 12:08 PM
> To: Martinx - ジェームズ
> Cc: Speichert,Daniel; openstack at lists.openstack.org
>
>
> Subject: Re: [Openstack] Directional network performance issues with Neutron
> + OpenvSwitch
>
>
>
>
>
>
>
> On Thu, Oct 24, 2013 at 10:37 AM, Martinx - ジェームズ
> <thiagocmartinsc at gmail.com> wrote:
>
> Precisely!
>
>
>
> The doc currently says to disable Namespace when using GRE, never did this
> before, look:
>
>
>
> http://docs.openstack.org/trunk/install-guide/install/apt/content/install-neutron.install-plugin.ovs.gre.html
>
>
>
> But on this very same doc, they say to enable it... Who knows?!   =P
>
>
>
> http://docs.openstack.org/trunk/install-guide/install/apt/content/section_networking-routers-with-private-networks.html
>
>
>
> I stick with Namespace enabled...
>
>
>
>
>
> Just a reminder, /trunk/ links are works in progress, thanks for bringing
> the mismatch to our attention, and we already have a doc bug filed:
>
>
>
> https://bugs.launchpad.net/openstack-manuals/+bug/1241056
>
>
>
> Review this patch: https://review.openstack.org/#/c/53380/
>
>
>
> Anne
>
>
>
>
>
>
>
> Let me ask you something, when you enable ovs_use_veth, que Metadata and
> DHCP still works?!
>
>
>
> Cheers!
>
> Thiago
>
>
>
> On 24 October 2013 12:22, Speichert,Daniel <djs428 at drexel.edu> wrote:
>
> Hello everyone,
>
>
>
> It seems we also ran into the same issue.
>
>
>
> We are running Ubuntu Saucy with OpenStack Havana from Ubuntu Cloud archives
> (precise-updates).
>
>
>
> The download speed to the VMs increased from 5 Mbps to maximum after
> enabling ovs_use_veth. Upload speed from the VMs is still terrible (max 1
> Mbps, usually 0.04 Mbps).
>
>
>
> Here is the iperf between the instance and L3 agent (network node) inside
> namespace.
>
>
>
> root at cloud:~# ip netns exec qrouter-a29e0200-d390-40d1-8cf7-7ac1cef5863a
> iperf -c 10.1.0.24 -r
>
> ------------------------------------------------------------
>
> Server listening on TCP port 5001
>
> TCP window size: 85.3 KByte (default)
>
> ------------------------------------------------------------
>
> ------------------------------------------------------------
>
> Client connecting to 10.1.0.24, TCP port 5001
>
> TCP window size:  585 KByte (default)
>
> ------------------------------------------------------------
>
> [  7] local 10.1.0.1 port 37520 connected with 10.1.0.24 port 5001
>
> [ ID] Interval       Transfer     Bandwidth
>
> [  7]  0.0-10.0 sec   845 MBytes   708 Mbits/sec
>
> [  6] local 10.1.0.1 port 5001 connected with 10.1.0.24 port 53006
>
> [  6]  0.0-31.4 sec   256 KBytes  66.7 Kbits/sec
>
>
>
> We are using Neutron OpenVSwitch with GRE and namespaces.
>
>
> A side question: the documentation says to disable namespaces with GRE and
> enable them with VLANs. It was always working well for us on Grizzly with
> GRE and namespaces and we could never get it to work without namespaces. Is
> there any specific reason why the documentation is advising to disable it?
>
>
>
> Regards,
>
> Daniel
>
>
>
> From: Martinx - ジェームズ [mailto:thiagocmartinsc at gmail.com]
> Sent: Thursday, October 24, 2013 3:58 AM
> To: Aaron Rosen
> Cc: openstack at lists.openstack.org
>
>
> Subject: Re: [Openstack] Directional network performance issues with Neutron
> + OpenvSwitch
>
>
>
> Hi Aaron,
>
>
>
> Thanks for answering!     =)
>
>
>
> Lets work...
>
>
>
> ---
>
>
>
> TEST #1 - iperf between Network Node and its Uplink router (Data Center's
> gateway "Internet") - OVS br-ex / eth2
>
>
>
> # Tenant Namespace route table
>
>
>
> root at net-node-1:~# ip netns exec
> qrouter-46cb8f7a-a3c5-4da7-ad69-4de63f7c34f1 ip route
>
> default via 172.16.0.1 dev qg-50b615b7-c2
>
> 172.16.0.0/20 dev qg-50b615b7-c2  proto kernel  scope link  src 172.16.0.2
>
> 192.168.210.0/24 dev qr-a1376f61-05  proto kernel  scope link  src
> 192.168.210.1
>
>
>
> # there is a "iperf -s" running at 172.16.0.1 "Internet", testing it
>
>
>
> root at net-node-1:~# ip netns exec
> qrouter-46cb8f7a-a3c5-4da7-ad69-4de63f7c34f1 iperf -c 172.16.0.1
>
> ------------------------------------------------------------
>
> Client connecting to 172.16.0.1, TCP port 5001
>
> TCP window size: 22.9 KByte (default)
>
> ------------------------------------------------------------
>
> [  5] local 172.16.0.2 port 58342 connected with 172.16.0.1 port 5001
>
> [ ID] Interval       Transfer     Bandwidth
>
> [  5]  0.0-10.0 sec   668 MBytes   559 Mbits/sec
>
> ---
>
>
>
> ---
>
>
>
> TEST #2 - iperf on one instance to the Namespace of the L3 agent + uplink
> router
>
>
>
> # iperf server running within Tenant's Namespace router
>
>
>
> root at net-node-1:~# ip netns exec
> qrouter-46cb8f7a-a3c5-4da7-ad69-4de63f7c34f1 iperf -s
>
>
>
> -
>
>
>
> # from instance-1
>
>
>
> ubuntu at instance-1:~$ ip route
>
> default via 192.168.210.1 dev eth0  metric 100
>
> 192.168.210.0/24 dev eth0  proto kernel  scope link  src 192.168.210.2
>
>
>
> # instance-1 performing tests against net-node-1 Namespace above
>
>
>
> ubuntu at instance-1:~$ iperf -c 192.168.210.1
>
> ------------------------------------------------------------
>
> Client connecting to 192.168.210.1, TCP port 5001
>
> TCP window size: 21.0 KByte (default)
>
> ------------------------------------------------------------
>
> [  3] local 192.168.210.2 port 43739 connected with 192.168.210.1 port 5001
>
> [ ID] Interval       Transfer     Bandwidth
>
> [  3]  0.0-10.0 sec   484 MBytes   406 Mbits/sec
>
>
>
> # still on instance-1, now against "External IP" of its own Namespace /
> Router
>
>
>
> ubuntu at instance-1:~$ iperf -c 172.16.0.2
>
> ------------------------------------------------------------
>
> Client connecting to 172.16.0.2, TCP port 5001
>
> TCP window size: 21.0 KByte (default)
>
> ------------------------------------------------------------
>
> [  3] local 192.168.210.2 port 34703 connected with 172.16.0.2 port 5001
>
> [ ID] Interval       Transfer     Bandwidth
>
> [  3]  0.0-10.0 sec   520 MBytes   436 Mbits/sec
>
>
>
> # still on instance-1, now against the Data Center UpLink Router
>
>
>
> ubuntu at instance-1:~$ iperf -c 172.16.0.1
>
> ------------------------------------------------------------
>
> Client connecting to 172.16.0.1, TCP port 5001
>
> TCP window size: 21.0 KByte (default)
>
> ------------------------------------------------------------
>
> [  3] local 192.168.210.4 port 38401 connected with 172.16.0.1 port 5001
>
> [ ID] Interval       Transfer     Bandwidth
>
> [  3]  0.0-10.0 sec   324 MBytes   271 Mbits/sec
>
> ---
>
>
>
> This latest test shows only 271 Mbits/s! I think it should be at least,
> 400~430 MBits/s... Right?!
>
>
>
> ---
>
>
>
> TEST #3 - Two instances on the same hypervisor
>
>
>
> # iperf server
>
>
>
> ubuntu at instance-2:~$ ip route
>
> default via 192.168.210.1 dev eth0  metric 100
>
> 192.168.210.0/24 dev eth0  proto kernel  scope link  src 192.168.210.4
>
>
>
> ubuntu at instance-2:~$ iperf -s
>
> ------------------------------------------------------------
>
> Server listening on TCP port 5001
>
> TCP window size: 85.3 KByte (default)
>
> ------------------------------------------------------------
>
> [  4] local 192.168.210.4 port 5001 connected with 192.168.210.2 port 45800
>
> [ ID] Interval       Transfer     Bandwidth
>
> [  4]  0.0-10.0 sec  4.61 GBytes  3.96 Gbits/sec
>
>
>
> # iperf client
>
>
>
> ubuntu at instance-1:~$ iperf -c 192.168.210.4
>
> ------------------------------------------------------------
>
> Client connecting to 192.168.210.4, TCP port 5001
>
> TCP window size: 21.0 KByte (default)
>
> ------------------------------------------------------------
>
> [  3] local 192.168.210.2 port 45800 connected with 192.168.210.4 port 5001
>
> [ ID] Interval       Transfer     Bandwidth
>
> [  3]  0.0-10.0 sec  4.61 GBytes  3.96 Gbits/sec
>
> ---
>
>
>
> ---
>
>
>
> TEST #4 - Two instances on different hypervisors - over GRE
>
>
>
> root at instance-2:~# iperf -s
>
> ------------------------------------------------------------
>
> Server listening on TCP port 5001
>
> TCP window size: 85.3 KByte (default)
>
> ------------------------------------------------------------
>
> [  4] local 192.168.210.4 port 5001 connected with 192.168.210.2 port 34640
>
> [ ID] Interval       Transfer     Bandwidth
>
> [  4]  0.0-10.0 sec   237 MBytes   198 Mbits/sec
>
>
>
>
>
> root at instance-1:~# iperf -c 192.168.210.4
>
> ------------------------------------------------------------
>
> Client connecting to 192.168.210.4, TCP port 5001
>
> TCP window size: 21.0 KByte (default)
>
> ------------------------------------------------------------
>
> [  3] local 192.168.210.2 port 34640 connected with 192.168.210.4 port 5001
>
> [ ID] Interval       Transfer     Bandwidth
>
> [  3]  0.0-10.0 sec   237 MBytes   198 Mbits/sec
>
> ---
>
>
>
> I just realized how slow is my intra-cloud (intra-VM) communication...   :-/
>
>
>
> ---
>
>
>
> TEST #5 - Two hypervisors - "GRE TUNNEL LAN" - OVS local_ip / remote_ip
>
>
>
> # Same path of "TEST #4" but, testing the physical GRE path (where GRE
> traffic flows)
>
>
>
> root at hypervisor-2:~$ iperf -s
>
> ------------------------------------------------------------
>
> Server listening on TCP port 5001
>
> TCP window size: 85.3 KByte (default)
>
> ------------------------------------------------------------
>
> n[  4] local 10.20.2.57 port 5001 connected with 10.20.2.53 port 51694
>
> [ ID] Interval       Transfer     Bandwidth
>
> [  4]  0.0-10.0 sec  1.09 GBytes   939 Mbits/sec
>
>
>
> root at hypervisor-1:~# iperf -c 10.20.2.57
>
> ------------------------------------------------------------
>
> Client connecting to 10.20.2.57, TCP port 5001
>
> TCP window size: 22.9 KByte (default)
>
> ------------------------------------------------------------
>
> [  3] local 10.20.2.53 port 51694 connected with 10.20.2.57 port 5001
>
> [ ID] Interval       Transfer     Bandwidth
>
> [  3]  0.0-10.0 sec  1.09 GBytes   939 Mbits/sec
>
> ---
>
>
>
> About Test #5, I don't know why the GRE traffic (Test #4) doesn't reach
> 1Gbit/sec (only ~200Mbit/s ?), since its physical path is much faster
> (GIGALan). Plus, Test #3 shows a pretty fast speed when traffic flows only
> within a hypervisor (3.96Gbit/sec).
>
>
>
> Tomorrow, I'll do this tests with netperf.
>
>
>
> NOTE: I'm using Open vSwitch 1.11.0, compiled for Ubuntu 12.04.3, via
> "dpkg-buildpackage" and installed via "Debian / Ubuntu way". If I downgrade
> to 1.10.2 from Havana Cloud Archive, same results... I can downgrade it, if
> you guys tell me to do so.
>
>
>
> BTW, I'll install another "Region", based on Havana on Ubuntu 13.10, with
> exactly the same configurations from my current Havana + Ubuntu 12.04.3, on
> top of the same hardware, to see if the problem still persist.
>
>
>
> Regards,
>
> Thiago
>
>
>
> On 23 October 2013 22:40, Aaron Rosen <arosen at nicira.com> wrote:
>
>
>
>
>
> On Mon, Oct 21, 2013 at 11:52 PM, Martinx - ジェームズ
> <thiagocmartinsc at gmail.com> wrote:
>
> James,
>
>
>
> I think I'm hitting this problem.
>
>
>
> I'm using "Per-Tenant Routers with Private Networks", GRE tunnels and
> L3+DHCP Network Node.
>
>
>
> The connectivity from behind my Instances is very slow. It takes an eternity
> to finish "apt-get update".
>
>
>
>
>
> I'm curious if you can do the following tests to help pinpoint the bottle
> neck:
>
>
>
> Run iperf or netperf between:
>
> two instances on the same hypervisor - this will determine if it's a
> virtualization driver issue if the performance is bad.
>
> two instances on different hypervisors.
>
> one instance to the namespace of the l3 agent.
>
>
>
>
>
>
>
>
>
>
>
>
>
> If I run "apt-get update" from within tenant's Namespace, it goes fine.
>
>
>
> If I enable "ovs_use_veth", Metadata (and/or DHCP) stops working and I and
> unable to start new Ubuntu Instances and login into them... Look:
>
>
>
> --
>
> cloud-init start running: Tue, 22 Oct 2013 05:57:39 +0000. up 4.01 seconds
>
> 2013-10-22 06:01:42,989 - util.py[WARNING]:
> 'http://169.254.169.254/2009-04-04/meta-data/instance-id' failed [3/120s]:
> url error [[Errno 113] No route to host]
>
> 2013-10-22 06:01:45,988 - util.py[WARNING]:
> 'http://169.254.169.254/2009-04-04/meta-data/instance-id' failed [6/120s]:
> url error [[Errno 113] No route to host]
>
> --
>
>
>
>
>
> Do you see anything interesting in the neutron-metadata-agent log? Or it
> looks like your instance doesn't have a route to the default gw?
>
>
>
>
>
> Is this problem still around?!
>
>
>
> Should I stay away from GRE tunnels when with Havana + Ubuntu 12.04.3?
>
>
>
> Is it possible to re-enable Metadata when ovs_use_veth = true ?
>
>
>
> Thanks!
>
> Thiago
>
>
>
> On 3 October 2013 06:27, James Page <james.page at ubuntu.com> wrote:
>
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA256
>
> On 02/10/13 22:49, James Page wrote:
>>> sudo ip netns exec qrouter-d3baf1b1-55ee-42cb-a3f6-9629288e3221
>>>> traceroute -n 10.5.0.2 -p 44444 --mtu traceroute to 10.5.0.2
>>>> (10.5.0.2), 30 hops max, 65000 byte packets 1  10.5.0.2  0.950
>>>> ms F=1500  0.598 ms  0.566 ms
>>>>
>>>> The PMTU from the l3 gateway to the instance looks OK to me.
>> I spent a bit more time debugging this; performance from within
>> the router netns on the L3 gateway node looks good in both
>> directions when accessing via the tenant network (10.5.0.2) over
>> the qr-XXXXX interface, but when accessing through the external
>> network from within the netns I see the same performance choke
>> upstream into the tenant network.
>>
>> Which would indicate that my problem lies somewhere around the
>> qg-XXXXX interface in the router netns - just trying to figure out
>> exactly what - maybe iptables is doing something wonky?
>
> OK - I found a fix but I'm not sure why this makes a difference;
> neither my l3-agent or dhcp-agent configuration had 'ovs_use_veth =
> True'; I switched this on, clearing everything down, rebooted and now
> I seem symmetric good performance across all neutron routers.
>
> This would point to some sort of underlying bug when ovs_use_veth = False.
>
>
>
> - --
> James Page
> Ubuntu and Debian Developer
> james.page at ubuntu.com
> jamespage at debian.org
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.14 (GNU/Linux)
> Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
>
> iQIcBAEBCAAGBQJSTTh6AAoJEL/srsug59jDmpEP/jaB5/yn9+Xm12XrVu0Q3IV5
> fLGOuBboUgykVVsfkWccI/oygNlBaXIcDuak/E4jxPcoRhLAdY1zpX8MQ8wSsGKd
> CjSeuW8xxnXubdfzmsCKSs3FCIBhDkSYzyiJd/raLvCfflyy8Cl7KN2x22mGHJ6z
> qZ9APcYfm9qCVbEssA3BHcUL+st1iqMJ0YhVZBk03+QEXaWu3FFbjpjwx3X1ZvV5
> Vbac7enqy7Lr4DSAIJVldeVuRURfv3YE3iJZTIXjaoUCCVTQLm5OmP9TrwBNHLsA
> 7W+LceQri+Vh0s4dHPKx5MiHsV3RCydcXkSQFYhx7390CXypMQ6WwXEY/a8Egssg
> SuxXByHwEcQFa+9sCwPQ+RXCmC0O6kUi8EPmwadjI5Gc1LoKw5Wov/SEen86fDUW
> P9pRXonseYyWN9I4MT4aG1ez8Dqq/SiZyWBHtcITxKI2smD92G9CwWGo4L9oGqJJ
> UcHRwQaTHgzy3yETPO25hjax8ZWZGNccHBixMCZKegr9p2dhR+7qF8G7mRtRQLxL
> 0fgOAExn/SX59ZT4RaYi9fI6Gng13RtSyI87CJC/50vfTmqoraUUK1aoSjIY4Dt+
> DYEMMLp205uLEj2IyaNTzykR0yh3t6dvfpCCcRA/xPT9slfa0a7P8LafyiWa4/5c
> jkJM4Y1BUV+2L5Rrf3sc
> =4lO4
>
> -----END PGP SIGNATURE-----
>
> _______________________________________________
> Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
> Post to     : openstack at lists.openstack.org
> Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>
>
>
>
> _______________________________________________
> Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
> Post to     : openstack at lists.openstack.org
> Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>
>
>
>
>
>
> _______________________________________________
> Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
> Post to     : openstack at lists.openstack.org
> Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>
>
>
>
> _______________________________________________
> Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
> Post to     : openstack at lists.openstack.org
> Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>
>
>
>
> _______________________________________________
> Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
> Post to     : openstack at lists.openstack.org
> Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>



-- 
Robert Collins <rbtcollins at hp.com>
Distinguished Technologist
HP Converged Cloud




More information about the Openstack mailing list