[Openstack] Directional network performance issues with Neutron + OpenvSwitch

Darragh O'Reilly dara2002-openstack at yahoo.com
Fri Oct 25 17:28:25 UTC 2013



ok, the tunnels look fine. One thing that looks funny on the network node are these untagged tap* devices. I guess you switched to using veths and then  switched back to not using them. I don't know if they matter, but you should clean them up by stopping everthing, running neutron-ovs-cleanup (check bridges empty) and reboot.

Bridge br-int Port "tapa1376f61-05" Interface "tapa1376f61-05" ...
Port "qr-a1376f61-05"
            tag: 1
            Interface "qr-a1376f61-05"
                type: internal

Re, Darragh.




On Friday, 25 October 2013, 17:28, Martinx - ジェームズ <thiagocmartinsc at gmail.com> wrote:
 
Here we go:
>
>
>---
>root at net-node-1:~# grep local_ip /etc/neutron/plugins/openvswitch/ovs_neutron_plugin.ini 
>local_ip = 10.20.2.52
>
>
>root at net-node-1:~# ip r | grep 10.\20
>10.20.2.0/24 dev eth1  proto kernel  scope link  src 10.20.2.52 
>---
>
>
>---
>root at hypervisor-1:~# grep local_ip /etc/neutron/plugins/openvswitch/ovs_neutron_plugin.ini
>local_ip = 10.20.2.53
>
>
>root at hypervisor-1:~# ip r | grep 10.\20
>10.20.2.0/24 dev eth1  proto kernel  scope link  src 10.20.2.53 
>---
>
>
>---
>root at hypervisor-2:~# grep local_ip /etc/neutron/plugins/openvswitch/ovs_neutron_plugin.ini
>local_ip = 10.20.2.57
>
>
>root at hypervisor-2:~# ip r | grep 10.\20
>10.20.2.0/24 dev eth1  proto kernel  scope link  src 10.20.2.57
>---
>
>
>Each "ovs-vsctl show":
>
>
>net-node-1: http://paste.openstack.org/show/49727/
>
>
>hypervisor-1: http://paste.openstack.org/show/49728/
>
>
>hypervisor-2: http://paste.openstack.org/show/49729/
>
>
>
>
>
>Best,
>Thiago
>
>
>
>On 25 October 2013 14:11, Darragh O'Reilly <dara2002-openstack at yahoo.com> wrote:
>
>
>>
>>the uneven ssh performance is strange - maybe learning on the tunnel mesh is not stablizing. It is easy to mess it up by giving a wrong local_ip in the ovs-plugin config file. Check the tunnels ports on br-tun with 'ovs-vsctl show'. Is each one using the correct IPs? Br-tun should have N-1 gre-x ports - no more! Maybe you can put 'ovs-vsctl show' from the nodes on paste.openstack if there are not to many?
>>
>>
>>Re, Darragh.
>>
>>
>>
>>
>>On Friday, 25 October 2013, 16:20, Martinx - ジェームズ <thiagocmartinsc at gmail.com> wrote:
>> 
>>I think can say... "YAY!!"    :-D
>>>
>>>
>>>With "LibvirtOpenVswitchDriver" my internal communication is the double now! It goes from ~200 (with LibvirtHybridOVSBridgeDriver) to 400Mbit/s (with LibvirtOpenVswitchDriver)! Still far from 1Gbit/s (my physical path limit) but, more acceptable now.
>>>
>>>
>>>The command "ethtool -K eth1 gro off" still makes no difference.
>>>
>>>
>>>So, there is only 1 remain problem, when traffic pass trough L3 / Namespace, it is still useless. Even the SSH connection into my Instances, via its Floating IPs, is slow as hell, sometimes it just stops responding for a few seconds, and becomes online again "out-of-nothing"...
>>>
>>>
>>>I just detect a weird "behavior", when I run "apt-get update" from instance-1, it is slow as I said plus, its ssh connection (where I'm running apt-get update), stops responding right after I run "apt-get update" AND, all my others ssh connections also stops working too! For a few seconds... This means that when I run "apt-get update" from within instance-1, the SSH session of instance-2 is affected too!! There is something pretty bad going on at L3 / Namespace.
>>>
>>>
>>>BTW, do you think that a ~400MBit/sec intra-vm-communication (GRE tunnel) on top of a 1Gbit ethernet is acceptable?! It is still less than a half...
>>>
>>>
>>>Thank you!
>>>Thiago
>>>
>>>
>>>On 25 October 2013 12:28, Darragh O'Reilly <dara2002-openstack at yahoo.com> wrote:
>>>
>>>Hi Thiago,
>>>>
>>>>
>>>>for the VIF error: you will need to change qemu.conf as described here:
>>>>http://openvswitch.org/openstack/documentation/
>>>>
>>>>
>>>>Re, Darragh.
>>>>
>>>>
>>>>
>>>>
>>>>On Friday, 25 October 2013, 15:14, Martinx - ジェームズ <thiagocmartinsc at gmail.com> wrote:
>>>> 
>>>>Hi Darragh,
>>>>>
>>>>>
>>>>>Yes, Instances are getting MTU 1400.
>>>>>
>>>>>
>>>>>I'm using LibvirtHybridOVSBridgeDriver at my Compute Nodes. I'll check BG 1223267 right now! 
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>The LibvirtOpenVswitchDriver doesn't work, look:
>>>>>
>>>>>
>>>>>http://paste.openstack.org/show/49709/
>>>>>
>>>>>
>>>>>
>>>>>http://paste.openstack.org/show/49710/
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>My NICs are "RTL8111/8168/8411 PCI Express Gigabit Ethernet", Hypervisors motherboard are MSI-890FXA-GD70.
>>>>>
>>>>>
>>>>>The command "ethtool -K eth1 gro off" did not had any effect on the communication between instances on different hypervisors, still poor, around 248Mbit/sec, when its physical path reach 1Gbit/s (where GRE is built).
>>>>>
>>>>>
>>>>>My Linux version is "Linux hypervisor-1 3.8.0-32-generic #47~precise1-Ubuntu", same kernel on Network Node" and others nodes too (Ubuntu 12.04.3 installed from scratch for this Havana deployment).
>>>>>
>>>>>
>>>>>The only difference I can see right now, between my two hypervisors, is that my second is just a spare machine, with a slow CPU but, I don't think it will have a negative impact at the network throughput, since I have only 1 Instance running into it (plus a qemu-nbd process eating 90% of its CPU). I'll replace this CPU tomorrow, to redo this tests again but, I don't think that this is the source of my problem. The MOBOs of two hypervisors are identical, 1 3Com (manageable) switch connecting the two.
>>>>>
>>>>>
>>>>>Thanks!
>>>>>Thiago
>>>>>
>>>>>
>>>>>
>>>>>On 25 October 2013 07:15, Darragh O'Reilly <dara2002-openstack at yahoo.com> wrote:
>>>>>
>>>>>Hi Thiago,
>>>>>>
>>>>>>you have configured DHCP to push out a MTU of 1400. Can you confirm that the 1400 MTU is actually getting out to the instances by running 'ip link' on them?
>>>>>>
>>>>>>There is an open problem where the veth used to connect the OVS and Linux bridges causes a performance drop on some kernels - https://bugs.launchpad.net/nova-project/+bug/1223267 .  If you are using the LibvirtHybridOVSBridgeDriver VIF driver, can you try changing to LibvirtOpenVswitchDriver and repeat the iperf test between instances on different compute-nodes.
>>>>>>
>>>>>>What NICs (maker+model) are you using? You could try disabling any off-load functionality - 'ethtool -k <iface-used-for-gre>'.
>>>>>>
>>>>>>What kernal are you using: 'uname -a'?
>>>>>>
>>>>>>Re, Darragh.
>>>>>>
>>>>>>
>>>>>>> Hi Daniel,
>>>>>>
>>>>>>>
>>>>>>> I followed that page, my Instances MTU is lowered by DHCP Agent but, same
>>>>>>> result: poor network performance (internal between Instances and when
>>>>>>> trying to reach the Internet).
>>>>>>>
>>>>>>> No matter if I use "dnsmasq_config_file=/etc/neutron/dnsmasq-neutron.conf +
>>>>>>> "dhcp-option-force=26,1400"" for my Neutron DHCP agent, or not (i.e. MTU =
>>>>>>> 1500), the result is almost the same.
>>>>>>>
>>>>>>> I'll try VXLAN (or just VLANs) this weekend to see if I can get better
>>>>>>> results...
>>>>>>>
>>>>>>> Thanks!
>>>>>>> Thiago
>>>>>>
>>>>>>
>>>>>>_______________________________________________
>>>>>>Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>>>>>>Post to     : openstack at lists.openstack.org
>>>>>>Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>>>>>>
>>>>>
>>>>>
>>>>>
>>>
>>>
>>>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack/attachments/20131025/0746d874/attachment.html>


More information about the Openstack mailing list