[Openstack] Directional network performance issues with Neutron + OpenvSwitch
Martinx - ジェームズ
thiagocmartinsc at gmail.com
Fri Oct 25 18:44:36 UTC 2013
Okay, cool!
tap** removed, neutron-ovs-cleanup ok, bridges empty, all nodes rebooted.
BUT, still poor performance when reaching "External" network from within a
Instance (plus SSH lags)... [?]
I'll install a new Network Node, in another hardware, to test it more...
Weird thing is, my Grizzly Network Node works perfectly on this very same
hardware (same OpenStack Network topology, of course)...
Hardware of my current "net-node-1":
* Grizzly - Okay
* Havana - Fails... ;-(
Best,
Thiago
On 25 October 2013 15:28, Darragh O'Reilly <dara2002-openstack at yahoo.com>wrote:
>
> ok, the tunnels look fine. One thing that looks funny on the network node
> are these untagged tap* devices. I guess you switched to using veths and
> then switched back to not using them. I don't know if they matter, but you
> should clean them up by stopping everthing, running neutron-ovs-cleanup
> (check bridges empty) and reboot.
>
> Bridge br-int
> Port "tapa1376f61-05"
> Interface "tapa1376f61-05"
> ...
> Port "qr-a1376f61-05"
> tag: 1
> Interface "qr-a1376f61-05"
> type: internal
>
> Re, Darragh.
>
>
>
> On Friday, 25 October 2013, 17:28, Martinx - ジェームズ <
> thiagocmartinsc at gmail.com> wrote:
>
> Here we go:
>
> ---
> root at net-node-1:~# grep local_ip
> /etc/neutron/plugins/openvswitch/ovs_neutron_plugin.ini
> local_ip = 10.20.2.52
>
> root at net-node-1:~# ip r | grep 10.\20
> 10.20.2.0/24 dev eth1 proto kernel scope link src 10.20.2.52
> ---
>
> ---
> root at hypervisor-1:~# grep local_ip
> /etc/neutron/plugins/openvswitch/ovs_neutron_plugin.ini
> local_ip = 10.20.2.53
>
> root at hypervisor-1:~# ip r | grep 10.\20
> 10.20.2.0/24 dev eth1 proto kernel scope link src 10.20.2.53
> ---
>
> ---
> root at hypervisor-2:~# grep local_ip
> /etc/neutron/plugins/openvswitch/ovs_neutron_plugin.ini
> local_ip = 10.20.2.57
>
> root at hypervisor-2:~# ip r | grep 10.\20
> 10.20.2.0/24 dev eth1 proto kernel scope link src 10.20.2.57
> ---
>
> Each "ovs-vsctl show":
>
> net-node-1: http://paste.openstack.org/show/49727/
>
> hypervisor-1: http://paste.openstack.org/show/49728/
>
> hypervisor-2: http://paste.openstack.org/show/49729/
>
>
> Best,
> Thiago
>
>
> On 25 October 2013 14:11, Darragh O'Reilly <dara2002-openstack at yahoo.com>wrote:
>
>
> the uneven ssh performance is strange - maybe learning on the tunnel mesh
> is not stablizing. It is easy to mess it up by giving a wrong local_ip in
> the ovs-plugin config file. Check the tunnels ports on br-tun with
> 'ovs-vsctl show'. Is each one using the correct IPs? Br-tun should have N-1
> gre-x ports - no more! Maybe you can put 'ovs-vsctl show' from the nodes on
> paste.openstack if there are not to many?
>
> Re, Darragh.
>
>
> On Friday, 25 October 2013, 16:20, Martinx - ジェームズ <
> thiagocmartinsc at gmail.com> wrote:
>
> I think can say... "YAY!!" :-D
>
> With "LibvirtOpenVswitchDriver" my internal communication is the double
> now! It goes from ~200 (with LibvirtHybridOVSBridgeDriver) to *400Mbit/s*(with LibvirtOpenVswitchDriver)! Still far from 1Gbit/s (my physical path
> limit) but, more acceptable now.
>
> The command "ethtool -K eth1 gro off" still makes no difference.
>
> So, there is only 1 remain problem, when traffic pass trough L3 /
> Namespace, it is still useless. Even the SSH connection into my Instances,
> via its Floating IPs, is slow as hell, sometimes it just stops responding
> for a few seconds, and becomes online again "out-of-nothing"...
>
> I just detect a weird "behavior", when I run "apt-get update" from
> instance-1, it is slow as I said plus, its ssh connection (where I'm
> running apt-get update), stops responding right after I run "apt-get
> update" AND, *all my others ssh connections also stops working too!* For
> a few seconds... This means that when I run "apt-get update" from within
> instance-1, the SSH session of instance-2 is affected too!! There is
> something pretty bad going on at L3 / Namespace.
>
> BTW, do you think that a ~400MBit/sec intra-vm-communication (GRE tunnel)
> on top of a 1Gbit ethernet is acceptable?! It is still less than a half...
>
> Thank you!
> Thiago
>
> On 25 October 2013 12:28, Darragh O'Reilly <dara2002-openstack at yahoo.com>wrote:
>
> Hi Thiago,
>
> for the VIF error: you will need to change qemu.conf as described here:
> http://openvswitch.org/openstack/documentation/
>
> Re, Darragh.
>
>
> On Friday, 25 October 2013, 15:14, Martinx - ジェームズ <
> thiagocmartinsc at gmail.com> wrote:
>
> Hi Darragh,
>
> Yes, Instances are getting MTU 1400.
>
> I'm using LibvirtHybridOVSBridgeDriver at my Compute Nodes. I'll check BG
> 1223267 right now!
>
>
> The LibvirtOpenVswitchDriver doesn't work, look:
>
> http://paste.openstack.org/show/49709/
>
> http://paste.openstack.org/show/49710/
>
>
> My NICs are "RTL8111/8168/8411 PCI Express Gigabit Ethernet", Hypervisors
> motherboard are MSI-890FXA-GD70.
>
> The command "ethtool -K eth1 gro off" did not had any effect on the
> communication between instances on different hypervisors, still poor,
> around 248Mbit/sec, when its physical path reach 1Gbit/s (where GRE is
> built).
>
> My Linux version is "Linux hypervisor-1 3.8.0-32-generic
> #47~precise1-Ubuntu", same kernel on Network Node" and others nodes too
> (Ubuntu 12.04.3 installed from scratch for this Havana deployment).
>
> The only difference I can see right now, between my two hypervisors, is
> that my second is just a spare machine, with a slow CPU but, I don't think
> it will have a negative impact at the network throughput, since I have only
> 1 Instance running into it (plus a qemu-nbd process eating 90% of its CPU).
> I'll replace this CPU tomorrow, to redo this tests again but, I don't think
> that this is the source of my problem. The MOBOs of two hypervisors
> are identical, 1 3Com (manageable) switch connecting the two.
>
> Thanks!
> Thiago
>
>
> On 25 October 2013 07:15, Darragh O'Reilly <dara2002-openstack at yahoo.com>wrote:
>
> Hi Thiago,
>
> you have configured DHCP to push out a MTU of 1400. Can you confirm that
> the 1400 MTU is actually getting out to the instances by running 'ip link'
> on them?
>
> There is an open problem where the veth used to connect the OVS and Linux
> bridges causes a performance drop on some kernels -
> https://bugs.launchpad.net/nova-project/+bug/1223267 . If you are using
> the LibvirtHybridOVSBridgeDriver VIF driver, can you try changing to
> LibvirtOpenVswitchDriver and repeat the iperf test between instances on
> different compute-nodes.
>
> What NICs (maker+model) are you using? You could try disabling any
> off-load functionality - 'ethtool -k <iface-used-for-gre>'.
>
> What kernal are you using: 'uname -a'?
>
> Re, Darragh.
>
> > Hi Daniel,
>
> >
> > I followed that page, my Instances MTU is lowered by DHCP Agent but, same
> > result: poor network performance (internal between Instances and when
> > trying to reach the Internet).
> >
> > No matter if I use
> "dnsmasq_config_file=/etc/neutron/dnsmasq-neutron.conf +
> > "dhcp-option-force=26,1400"" for my Neutron DHCP agent, or not (i.e. MTU
> =
> > 1500), the result is almost the same.
> >
> > I'll try VXLAN (or just VLANs) this weekend to see if I can get better
> > results...
> >
> > Thanks!
> > Thiago
>
> _______________________________________________
> Mailing list:
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
> Post to : openstack at lists.openstack.org
> Unsubscribe :
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>
>
>
>
>
>
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack/attachments/20131025/ca6a65aa/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 35F.gif
Type: image/gif
Size: 540 bytes
Desc: not available
URL: <http://lists.openstack.org/pipermail/openstack/attachments/20131025/ca6a65aa/attachment.gif>
More information about the Openstack
mailing list