[Openstack] Directional network performance issues with Neutron + OpenvSwitch

Martinx - ジェームズ thiagocmartinsc at gmail.com
Sun Nov 10 00:09:28 UTC 2013


Guys,

This problem is kind of a "deal breaker"... I was counting on OpenStack
Havana (and with Ubuntu) for my first public cloud that I'm (was) about to
announce / launch but, this problem changed everything.

I can not put Havana with Ubuntu LTS into production because of this
network issue. This is a very serious problem for me... Since all sites, or
even ssh connections, that pass through the "Floating IPs" entering into
the tenant's subnets, are very slow and, all the connections freezes for
seconds, every minute.

Again, I'm seeing that there is no way to put Havana into production (using
Per-Tenant Routers with Private Networks), *because the Network Node is
broken*. At least when with Ubuntu... I'll try it with Debian 7, or CentOS
(I don't like it), just to see if the problem persist but, I prefer Ubuntu
distro since Warty Warthog...    :-/

So, what is being done to fix it? I already tried everything I could,
without any kind of success...

Also, I followed this doc (to triple * triple re-check my env):
http://docs.openstack.org/havana/install-guide/install/apt/content/section_networking-routers-with-private-networks.html
but,
it does not work as expected.

BTW, I can give full access into my environment for you guys, no problem...
I can build a lab from scratch, following your instructions, I can also
give root access to OpenStack experts... Just, let me know...    =)

Thanks!
Thiago

On 6 November 2013 09:20, Martinx - ジェームズ <thiagocmartinsc at gmail.com> wrote:

> Hello Stackers!
>
> Sorry to not back on this topic last week, too many things to do...
>
> So, instead of trying this and that, reply this, reply again... I made a
> video about this problem, I hope that helps more than those e-mails that
> I'm writing!    =P
>
> Honestly, I don't know the source of this problem, if it is with OpenStack
> / Neutron, or with "Linux / Namespace / OVS"... It would be great to test
> it alone, Ubuntu Linux + Namespace + OVS (without Neutron), to see if the
> problem persist but, I have no idea about how to setup everything, just
> like Neutron does. Maybe, I just need to reproduce the "Namespace and OVS
> bridges / ports / VXLAN - as is", without Neutron?! I can try that...
>
> Also, my Grizzly setup is gone, I deleted it... Sorry about that... I know
> it works because it is the first time I'm seeing this problem... I had used
> Grizzly for ~5 months with only 1 problem (related to MTU 1400) but, this
> problem with Havana is totally different...
>
>
> Video:
>
> OpenStack Havana L3 Router problem - Ubuntu 12.04.3 LTS:
> http://www.youtube.com/watch?v=jVjiphMuuzM
>
>
> * After 5 minutes, I inserted a new video, showing how I "fixed" it, by
> running Squid within the Tenant router. You guys can see that, using the
> default Tenant router (10:30), it will take about 1 hour to finish the
> "apt-get download" and, with Squid (09:27), it goes down to about 3 minutes
> (no, it is still not cached, I clean it for each test).
>
>
> Sorry about the size of the video, it is about 12 minutes and high-res (to
> see the screen details) but, it is a serious problem and I think it worth
> watching it...
>
> NOTE: Sorry about my English! It is very hard to "speak" a non-native
> language, handling an Android phone and typing the keyboard...    :-)
>
> Best!
> Thiago
>
>
>
> On 28 October 2013 07:00, Darragh O'Reilly <dara2002-openstack at yahoo.com>wrote:
>
>> Thiago,
>>
>> some more answers below.
>>
>> Btw: I saw the problem with a "qemu-nbd -c" process using all the cpu on
>> the compute. It happened just once - must be a bug in it. You can disable
>> libvirt injection if you don't want it by setting "libvirt_inject_partition
>> = -2" in nova.conf.
>>
>>
>> On Saturday, 26 October 2013, 16:58, Martinx - ジェームズ <
>> thiagocmartinsc at gmail.com> wrote:
>>
>> Hi Darragh,
>> >
>> >
>> >Yes, on the same net-node machine, Grizzly works, Havana don't... But,
>> for Grizzly, I have Ubuntu 12.04 with Linux 3.2 and >OVS 1.4.0-1ubuntu1.6.
>>
>>
>> so we don't know if the problem is due to Neutron, the Ubuntu kernel or
>> OVS. I suspect the kernel as it implements the routing/natting, interfaces
>> and namespaces.  I don't think Neutron Havana changes how these things are
>> setup too much.
>>
>> Can you try running Havana on a network node with the Linux 3.2 kernel?
>>
>>
>> >
>> >
>> >If I replace the Havana net-node hardware entirely, the problem persist
>> (i.e. it "follows" Havana net-node), so, I think, it can not be related to
>> the hardware.
>> >
>> >
>> >I tried Havana with both OVS 1.10.2 (from Cloud Archive) and with OVS
>> 1.11.0 (compiled and installed by myself using dpkg-buildpackage / dpkg).
>> >
>> >
>> >My logs (including Open vSwitch) right after starting an Instance
>> (nothing at OVS logs):
>> >
>> >
>> >http://paste.openstack.org/show/49870/
>> >
>> >
>> >
>> >I tried everything, including installing the Network Node on top of a
>> KVM virtual machine or directly on a dedicated server, same result, the
>> problem follows Hanava node (virtual or physical). Grizzly Network Node
>> works both on a KVM VM or on a dedicated server.
>> >
>> >
>> >Regards,
>> >Thiago
>> >
>> >
>> >
>> >On 26 October 2013 06:28, Darragh OReilly wrote:
>> >
>> >Hi Thiago,
>> >>
>> >>so just to confirm - on the same netnode machine, with the same OS,
>> kernal and OVS versions - Grizzly is ok and Havana is not?
>> >>
>> >>Also, on the network node, are there any errors in the neutron logs,
>> the syslog, or /var/log/openvswitch/* ?
>> >>
>> >>
>> >>
>> >>Re, Darragh.
>> >>
>> >>
>> >>
>> >>
>> >>On Saturday, 26 October 2013, 5:25, Martinx - ジェームズ <
>> thiagocmartinsc at gmail.com> wrote:
>> >>
>> >>LOL... One day, Internet via "Quantum Entanglement"! Oops, Neutron!
>> =P
>> >>>
>> >>>
>> >>>
>> >>>I'll ignore the problems related to the "performance between two
>> instances on different hypervisors" for now. My priority is the
>> connectivity issue with the External networks... At least, internal is slow
>> but it works.
>> >>>
>> >>>
>> >>>I'm about to remove the L3 Agent / Namespaces entirely from my
>> topology... It is a shame because it is pretty cool! With Grizzly I had no
>> problems at all. Plus, I need to put Havana into production ASAP!    :-/
>> >>>
>> >>>
>> >>>Why I'm giving it up (of L3 / NS) for now? Because I tried:
>> >>>
>> >>>
>> >>>The option "tenant_network_type" with gre, vxlan and vlan (range
>> physnet1:206:256 configured at the 3Com switch as tagged).
>> >>>
>> >>>
>> >>>From the instances, the connection with External network is always
>> slow, no matter if I choose for Tenants, GRE, VXLAN or VLAN.
>> >>>
>> >>>
>> >>>For example, right now, I'm using VLAN, same problem.
>> >>>
>> >>>
>> >>>Don't you guys think that this can be a problem with the bridge
>> "br-ex" and its internals ? Since I swapped the "Tenant Network Type" 3
>> times, same result... But I still did not removed the br-ex from the scene.
>> >>>
>> >>>
>> >>>If someone wants to debug it, I can give the root password, no
>> problem, it is just a lab...   =)
>> >>>
>> >>>
>> >>>Thanks!
>> >>>Thiago
>> >>>
>> >>>
>> >>>On 25 October 2013 19:45, Rick Jones <rick.jones2 at hp.com> wrote:
>> >>>
>> >>>On 10/25/2013 02:37 PM, Martinx - ジェームズ wrote:
>> >>>>
>> >>>>WOW!! Thank you for your time Rick! Awesome answer!!    =D
>> >>>>>
>> >>>>>I'll do this tests (with ethtool GRO / CKO) tonight but, do you think
>> >>>>>that this is the main root of the problem?!
>> >>>>>
>> >>>>>
>> >>>>>I mean, I'm seeing two distinct problems here:
>> >>>>>
>> >>>>>1- Slow connectivity to the External network plus SSH lags all over
>> the
>> >>>>>cloud (everything that pass trough L3 / Namespace is problematic),
>> and;
>> >>>>>
>> >>>>>2- Communication between two Instances on different hypervisors (i.e.
>> >>>>>maybe it is related to this GRO / CKO thing).
>> >>>>>
>> >>>>>
>> >>>>>So, two different problems, right?!
>> >>>>>
>> >>>>
>> One or two problems I cannot say.    Certainly if one got the benefit of
>> stateless offloads in one direction and not the other, one could see
>> different performance limits in each direction.
>> >>>>
>> >>>>All I can really say is I liked it better when we were called
>> Quantum, because then I could refer to it as "Spooky networking at a
>> distance."  Sadly, describing Neutron as "Networking with no inherent
>> charge" doesn't work as well :)
>> >>>>
>> >>>>rick jones
>> >>>>
>> >>>>
>> >>>
>> >>>
>> >>>
>> >
>> >
>> >
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack/attachments/20131109/af4ff758/attachment.html>


More information about the Openstack mailing list