[Openstack] Directional network performance issues with Neutron + OpenvSwitch

Martinx - ジェームズ thiagocmartinsc at gmail.com
Wed Nov 6 11:20:06 UTC 2013


Hello Stackers!

Sorry to not back on this topic last week, too many things to do...

So, instead of trying this and that, reply this, reply again... I made a
video about this problem, I hope that helps more than those e-mails that
I'm writing!    =P

Honestly, I don't know the source of this problem, if it is with OpenStack
/ Neutron, or with "Linux / Namespace / OVS"... It would be great to test
it alone, Ubuntu Linux + Namespace + OVS (without Neutron), to see if the
problem persist but, I have no idea about how to setup everything, just
like Neutron does. Maybe, I just need to reproduce the "Namespace and OVS
bridges / ports / VXLAN - as is", without Neutron?! I can try that...

Also, my Grizzly setup is gone, I deleted it... Sorry about that... I know
it works because it is the first time I'm seeing this problem... I had used
Grizzly for ~5 months with only 1 problem (related to MTU 1400) but, this
problem with Havana is totally different...


Video:

OpenStack Havana L3 Router problem - Ubuntu 12.04.3 LTS:
http://www.youtube.com/watch?v=jVjiphMuuzM


* After 5 minutes, I inserted a new video, showing how I "fixed" it, by
running Squid within the Tenant router. You guys can see that, using the
default Tenant router (10:30), it will take about 1 hour to finish the
"apt-get download" and, with Squid (09:27), it goes down to about 3 minutes
(no, it is still not cached, I clean it for each test).


Sorry about the size of the video, it is about 12 minutes and high-res (to
see the screen details) but, it is a serious problem and I think it worth
watching it...

NOTE: Sorry about my English! It is very hard to "speak" a non-native
language, handling an Android phone and typing the keyboard...    :-)

Best!
Thiago



On 28 October 2013 07:00, Darragh O'Reilly <dara2002-openstack at yahoo.com>wrote:

> Thiago,
>
> some more answers below.
>
> Btw: I saw the problem with a "qemu-nbd -c" process using all the cpu on
> the compute. It happened just once - must be a bug in it. You can disable
> libvirt injection if you don't want it by setting "libvirt_inject_partition
> = -2" in nova.conf.
>
>
> On Saturday, 26 October 2013, 16:58, Martinx - ジェームズ <
> thiagocmartinsc at gmail.com> wrote:
>
> Hi Darragh,
> >
> >
> >Yes, on the same net-node machine, Grizzly works, Havana don't... But,
> for Grizzly, I have Ubuntu 12.04 with Linux 3.2 and >OVS 1.4.0-1ubuntu1.6.
>
>
> so we don't know if the problem is due to Neutron, the Ubuntu kernel or
> OVS. I suspect the kernel as it implements the routing/natting, interfaces
> and namespaces.  I don't think Neutron Havana changes how these things are
> setup too much.
>
> Can you try running Havana on a network node with the Linux 3.2 kernel?
>
>
> >
> >
> >If I replace the Havana net-node hardware entirely, the problem persist
> (i.e. it "follows" Havana net-node), so, I think, it can not be related to
> the hardware.
> >
> >
> >I tried Havana with both OVS 1.10.2 (from Cloud Archive) and with OVS
> 1.11.0 (compiled and installed by myself using dpkg-buildpackage / dpkg).
> >
> >
> >My logs (including Open vSwitch) right after starting an Instance
> (nothing at OVS logs):
> >
> >
> >http://paste.openstack.org/show/49870/
> >
> >
> >
> >I tried everything, including installing the Network Node on top of a KVM
> virtual machine or directly on a dedicated server, same result, the problem
> follows Hanava node (virtual or physical). Grizzly Network Node works both
> on a KVM VM or on a dedicated server.
> >
> >
> >Regards,
> >Thiago
> >
> >
> >
> >On 26 October 2013 06:28, Darragh OReilly wrote:
> >
> >Hi Thiago,
> >>
> >>so just to confirm - on the same netnode machine, with the same OS,
> kernal and OVS versions - Grizzly is ok and Havana is not?
> >>
> >>Also, on the network node, are there any errors in the neutron logs, the
> syslog, or /var/log/openvswitch/* ?
> >>
> >>
> >>
> >>Re, Darragh.
> >>
> >>
> >>
> >>
> >>On Saturday, 26 October 2013, 5:25, Martinx - ジェームズ <
> thiagocmartinsc at gmail.com> wrote:
> >>
> >>LOL... One day, Internet via "Quantum Entanglement"! Oops, Neutron!
> =P
> >>>
> >>>
> >>>
> >>>I'll ignore the problems related to the "performance between two
> instances on different hypervisors" for now. My priority is the
> connectivity issue with the External networks... At least, internal is slow
> but it works.
> >>>
> >>>
> >>>I'm about to remove the L3 Agent / Namespaces entirely from my
> topology... It is a shame because it is pretty cool! With Grizzly I had no
> problems at all. Plus, I need to put Havana into production ASAP!    :-/
> >>>
> >>>
> >>>Why I'm giving it up (of L3 / NS) for now? Because I tried:
> >>>
> >>>
> >>>The option "tenant_network_type" with gre, vxlan and vlan (range
> physnet1:206:256 configured at the 3Com switch as tagged).
> >>>
> >>>
> >>>From the instances, the connection with External network is always
> slow, no matter if I choose for Tenants, GRE, VXLAN or VLAN.
> >>>
> >>>
> >>>For example, right now, I'm using VLAN, same problem.
> >>>
> >>>
> >>>Don't you guys think that this can be a problem with the bridge "br-ex"
> and its internals ? Since I swapped the "Tenant Network Type" 3 times, same
> result... But I still did not removed the br-ex from the scene.
> >>>
> >>>
> >>>If someone wants to debug it, I can give the root password, no problem,
> it is just a lab...   =)
> >>>
> >>>
> >>>Thanks!
> >>>Thiago
> >>>
> >>>
> >>>On 25 October 2013 19:45, Rick Jones <rick.jones2 at hp.com> wrote:
> >>>
> >>>On 10/25/2013 02:37 PM, Martinx - ジェームズ wrote:
> >>>>
> >>>>WOW!! Thank you for your time Rick! Awesome answer!!    =D
> >>>>>
> >>>>>I'll do this tests (with ethtool GRO / CKO) tonight but, do you think
> >>>>>that this is the main root of the problem?!
> >>>>>
> >>>>>
> >>>>>I mean, I'm seeing two distinct problems here:
> >>>>>
> >>>>>1- Slow connectivity to the External network plus SSH lags all over
> the
> >>>>>cloud (everything that pass trough L3 / Namespace is problematic),
> and;
> >>>>>
> >>>>>2- Communication between two Instances on different hypervisors (i.e.
> >>>>>maybe it is related to this GRO / CKO thing).
> >>>>>
> >>>>>
> >>>>>So, two different problems, right?!
> >>>>>
> >>>>
> One or two problems I cannot say.    Certainly if one got the benefit of
> stateless offloads in one direction and not the other, one could see
> different performance limits in each direction.
> >>>>
> >>>>All I can really say is I liked it better when we were called Quantum,
> because then I could refer to it as "Spooky networking at a distance."
>  Sadly, describing Neutron as "Networking with no inherent charge" doesn't
> work as well :)
> >>>>
> >>>>rick jones
> >>>>
> >>>>
> >>>
> >>>
> >>>
> >
> >
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack/attachments/20131106/08aa0d15/attachment.html>


More information about the Openstack mailing list