[Openstack] Directional network performance issues with Neutron + OpenvSwitch

Martinx - ジェームズ thiagocmartinsc at gmail.com
Sun Oct 27 22:01:31 UTC 2013


Stackers,

I have a small report from my latest tests.


Tests:

* Namespace (br-ex) *<->* Internet - OK

* Namespace (vxlan,gre,vlan) *<->* Tenant - OK

* Tenant *<->* Namespace *<->* Internet - *NOT-OK* (Very slow / Unstable /
Intermittent)


Since the connectivity from Tenant to its Namespace is fine AND, from its
Namespace to the Internet is also fine too, then, come to my mind: Hey, why
not run Squid WITHIN the Tenant Namespace as a workaround?!

And... Voialá! There I "Fixed" It!    =P


New Test:

Tenant *<->* *Namespace with Squid* *<->* Internet - OK!


*NOTE:* I'm sure that the entire ethernet path (without L3, Namespace, OVS,
VXLANs, GREs, or Linux bridges, just plain Linux + IPs), *from the
hypervisor to the Internet*, *passing trough the same Network Node hardware
/ path*, is working smoothly. I mean, I tested the entire path BEFORE
installing OpenStack Havana... So, I it can not be a "infrastructure /
hardware" issue, it must be something else, located at the software layer
running within the Network Node itself.

I'm about to send more info about this problem.

Thanks!
Thiago

On 26 October 2013 13:57, Martinx - ジェームズ <thiagocmartinsc at gmail.com> wrote:

> Hi Darragh,
>
> Yes, on the same net-node machine, Grizzly works, Havana don't... But, for
> Grizzly, I have Ubuntu 12.04 with Linux 3.2 and OVS 1.4.0-1ubuntu1.6.
>
> If I replace the Havana net-node hardware entirely, the problem persist
> (i.e. it "follows" Havana net-node), so, I think, it can not be related to
> the hardware.
>
> I tried Havana with both OVS 1.10.2 (from Cloud Archive) and with OVS
> 1.11.0 (compiled and installed by myself using dpkg-buildpackage / dpkg).
>
> My logs (including Open vSwitch) right after starting an Instance (nothing
> at OVS logs):
>
> http://paste.openstack.org/show/49870/
>
> I tried everything, including installing the Network Node on top of a KVM
> virtual machine or directly on a dedicated server, same result, the problem
> follows Hanava node (virtual or physical). Grizzly Network Node works both
> on a KVM VM or on a dedicated server.
>
> Regards,
> Thiago
>
>
> On 26 October 2013 06:28, Darragh OReilly <darragh.oreilly at yahoo.com>wrote:
>
>> Hi Thiago,
>>
>> so just to confirm - on the same netnode machine, with the same OS,
>> kernal and OVS versions - Grizzly is ok and Havana is not?
>>
>> Also, on the network node, are there any errors in the neutron logs, the
>> syslog, or /var/log/openvswitch/* ?
>>
>> Re, Darragh.
>>
>>
>>   On Saturday, 26 October 2013, 5:25, Martinx - ジェームズ <
>> thiagocmartinsc at gmail.com> wrote:
>>
>> LOL... One day, Internet via "Quantum Entanglement"! Oops, Neutron!     =P
>>
>> I'll ignore the problems related to the "performance between two
>> instances on different hypervisors" for now. My priority is the
>> connectivity issue with the External networks... At least, internal is slow
>> but it works.
>>
>> I'm about to remove the L3 Agent / Namespaces entirely from my
>> topology... It is a shame because it is pretty cool! With Grizzly I had no
>> problems at all. Plus, I need to put Havana into production ASAP!    :-/
>>
>> Why I'm giving it up (of L3 / NS) for now? Because I tried:
>>
>> The option "tenant_network_type" with gre, vxlan and vlan (range
>> physnet1:206:256 configured at the 3Com switch as tagged).
>>
>> From the instances, the connection with External network *is always slow*,
>> no matter if I choose for Tenants, GRE, VXLAN or VLAN.
>>
>> For example, right now, I'm using VLAN, same problem.
>>
>> Don't you guys think that this can be a problem with the bridge "br-ex"
>> and its internals ? Since I swapped the "Tenant Network Type" 3 times, same
>> result... But I still did not removed the br-ex from the scene.
>>
>> If someone wants to debug it, I can give the root password, no problem,
>> it is just a lab...   =)
>>
>> Thanks!
>> Thiago
>>
>> On 25 October 2013 19:45, Rick Jones <rick.jones2 at hp.com> wrote:
>>
>> On 10/25/2013 02:37 PM, Martinx - ジェームズ wrote:
>>
>> WOW!! Thank you for your time Rick! Awesome answer!!    =D
>>
>> I'll do this tests (with ethtool GRO / CKO) tonight but, do you think
>> that this is the main root of the problem?!
>>
>>
>> I mean, I'm seeing two distinct problems here:
>>
>> 1- Slow connectivity to the External network plus SSH lags all over the
>> cloud (everything that pass trough L3 / Namespace is problematic), and;
>>
>> 2- Communication between two Instances on different hypervisors (i.e.
>> maybe it is related to this GRO / CKO thing).
>>
>>
>> So, two different problems, right?!
>>
>>
>> One or two problems I cannot say.    Certainly if one got the benefit of
>> stateless offloads in one direction and not the other, one could see
>> different performance limits in each direction.
>>
>> All I can really say is I liked it better when we were called Quantum,
>> because then I could refer to it as "Spooky networking at a distance."
>>  Sadly, describing Neutron as "Networking with no inherent charge" doesn't
>> work as well :)
>>
>> rick jones
>>
>>
>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack/attachments/20131027/feeb3fec/attachment.html>


More information about the Openstack mailing list