<div dir="ltr">Guys,<div><br></div><div>This problem is kind of a "deal breaker"... I was counting on OpenStack Havana (and with Ubuntu) for my first public cloud that I'm (was) about to announce / launch but, this problem changed everything.</div>
<div><br></div><div>I can not put Havana with Ubuntu LTS into production because of this network issue. This is a very serious problem for me... Since all sites, or even ssh connections, that pass through the "Floating IPs" entering into the tenant's subnets, are very slow and, all the connections freezes for seconds, every minute.</div>
<div><br></div><div>Again, I'm seeing that there is no way to put Havana into production (using Per-Tenant Routers with Private Networks), <u>because the Network Node is broken</u>. At least when with Ubuntu... I'll try it with Debian 7, or CentOS (I don't like it), just to see if the problem persist but, I prefer Ubuntu distro since Warty Warthog... :-/</div>
<div><br></div><div>So, what is being done to fix it? I already tried everything I could, without any kind of success...</div>
<div><br></div><div>Also, I followed this doc (to triple * triple re-check my env): <a href="http://docs.openstack.org/havana/install-guide/install/apt/content/section_networking-routers-with-private-networks.html">http://docs.openstack.org/havana/install-guide/install/apt/content/section_networking-routers-with-private-networks.html</a> but, it does not work as expected.</div>
<div><br></div><div>BTW, I can give full access into my environment for you guys, no problem... I can build a lab from scratch, following your instructions, I can also give root access to OpenStack experts... Just, let me know... =)</div>
<div><br></div><div>Thanks!</div><div>Thiago</div><div class="gmail_extra"><br><div class="gmail_quote">On 6 November 2013 09:20, Martinx - $B%8%'!<%`%:(B <span dir="ltr"><<a href="mailto:thiagocmartinsc@gmail.com" target="_blank">thiagocmartinsc@gmail.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div dir="ltr">Hello Stackers!<div><br></div><div>Sorry to not back on this topic last week, too many things to do...</div>
<div><br></div><div>So, instead of trying this and that, reply this, reply again... I made a video about this problem, I hope that helps more than those e-mails that I'm writing! =P</div>
<div><br></div><div>Honestly, I don't know the source of this problem, if it is with OpenStack / Neutron, or with "Linux / Namespace / OVS"... It would be great to test it alone, Ubuntu Linux + Namespace + OVS (without Neutron), to see if the problem persist but, I have no idea about how to setup everything, just like Neutron does. Maybe, I just need to reproduce the "Namespace and OVS bridges / ports / VXLAN - as is", without Neutron?! I can try that...</div>
<div><br></div><div>Also, my Grizzly setup is gone, I deleted it... Sorry about that... I know it works because it is the first time I'm seeing this problem... I had used Grizzly for ~5 months with only 1 problem (related to MTU 1400) but, this problem with Havana is totally different...</div>
<div><br></div><div><br></div><div>Video:</div><div><br></div><div><div>OpenStack Havana L3 Router problem - Ubuntu 12.04.3 LTS: <a href="http://www.youtube.com/watch?v=jVjiphMuuzM" target="_blank">http://www.youtube.com/watch?v=jVjiphMuuzM</a></div>
</div><div><br></div><div><br></div><div>* After 5 minutes, I inserted a new video, showing how I "fixed" it, by running Squid within the Tenant router. You guys can see that, using the default Tenant router (10:30), it will take about 1 hour to finish the "apt-get download" and, with Squid (09:27), it goes down to about 3 minutes (no, it is still not cached, I clean it for each test).</div>
<div><br></div><div><br></div><div>Sorry about the size of the video, it is about 12 minutes and high-res (to see the screen details) but, it is a serious problem and I think it worth watching it...</div><div><br></div><div>
NOTE: Sorry about my English! It is very hard to "speak" a non-native language, handling an Android phone and typing the keyboard... :-)</div><div><br></div><div>Best!</div><div>Thiago</div><div><br></div></div>
<div><div>
<div class="gmail_extra"><br><br><div class="gmail_quote">On 28 October 2013 07:00, Darragh O'Reilly <span dir="ltr"><<a href="mailto:dara2002-openstack@yahoo.com" target="_blank">dara2002-openstack@yahoo.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">Thiago,<br>
<br>
some more answers below.<br>
<br>
Btw: I saw the problem with a "qemu-nbd -c" process using all the cpu on the compute. It happened just once - must be a bug in it. You can disable libvirt injection if you don't want it by setting "libvirt_inject_partition = -2" in nova.conf.<br>
<div><br>
<br>
On Saturday, 26 October 2013, 16:58, Martinx - $B%8%'!<%`%:(B <<a href="mailto:thiagocmartinsc@gmail.com" target="_blank">thiagocmartinsc@gmail.com</a>> wrote:<br>
<br>
Hi Darragh,<br>
><br>
><br>
>Yes, on the same net-node machine, Grizzly works, Havana don't... But, for Grizzly, I have Ubuntu 12.04 with Linux 3.2 and >OVS 1.4.0-1ubuntu1.6.<br>
<br>
<br>
</div>so we don't know if the problem is due to Neutron, the Ubuntu kernel or OVS. I suspect the kernel as it implements the routing/natting, interfaces and namespaces. I don't think Neutron Havana changes how these things are setup too much.<br>
<br>
Can you try running Havana on a network node with the Linux 3.2 kernel?<br>
<div><br>
<br>
><br>
><br>
>If I replace the Havana net-node hardware entirely, the problem persist (i.e. it "follows" Havana net-node), so, I think, it can not be related to the hardware.<br>
><br>
><br>
>I tried Havana with both OVS 1.10.2 (from Cloud Archive) and with OVS 1.11.0 (compiled and installed by myself using dpkg-buildpackage / dpkg).<br>
><br>
><br>
>My logs (including Open vSwitch) right after starting an Instance (nothing at OVS logs):<br>
><br>
><br>
><a href="http://paste.openstack.org/show/49870/" target="_blank">http://paste.openstack.org/show/49870/</a><br>
><br>
><br>
><br>
>I tried everything, including installing the Network Node on top of a KVM virtual machine or directly on a dedicated server, same result, the problem follows Hanava node (virtual or physical). Grizzly Network Node works both on a KVM VM or on a dedicated server.<br>
><br>
><br>
>Regards,<br>
>Thiago<br>
><br>
><br>
><br>
</div><div><div>>On 26 October 2013 06:28, Darragh OReilly wrote:<br>
><br>
>Hi Thiago,<br>
>><br>
>>so just to confirm - on the same netnode machine, with the same OS, kernal and OVS versions - Grizzly is ok and Havana is not?<br>
>><br>
>>Also, on the network node, are there any errors in the neutron logs, the syslog, or /var/log/openvswitch/* ?<br>
>><br>
>><br>
>><br>
>>Re, Darragh.<br>
>><br>
>><br>
>><br>
>><br>
>>On Saturday, 26 October 2013, 5:25, Martinx - $B%8%'!<%`%:(B <<a href="mailto:thiagocmartinsc@gmail.com" target="_blank">thiagocmartinsc@gmail.com</a>> wrote:<br>
>><br>
>>LOL... One day, Internet via "Quantum Entanglement"! Oops, Neutron! =P<br>
>>><br>
>>><br>
>>><br>
>>>I'll ignore the problems related to the "performance between two instances on different hypervisors" for now. My priority is the connectivity issue with the External networks... At least, internal is slow but it works.<br>
>>><br>
>>><br>
>>>I'm about to remove the L3 Agent / Namespaces entirely from my topology... It is a shame because it is pretty cool! With Grizzly I had no problems at all. Plus, I need to put Havana into production ASAP! :-/<br>
>>><br>
>>><br>
>>>Why I'm giving it up (of L3 / NS) for now? Because I tried:<br>
>>><br>
>>><br>
>>>The option "tenant_network_type" with gre, vxlan and vlan (range physnet1:206:256 configured at the 3Com switch as tagged).<br>
>>><br>
>>><br>
>>>From the instances, the connection with External network is always slow, no matter if I choose for Tenants, GRE, VXLAN or VLAN.<br>
>>><br>
>>><br>
>>>For example, right now, I'm using VLAN, same problem.<br>
>>><br>
>>><br>
>>>Don't you guys think that this can be a problem with the bridge "br-ex" and its internals ? Since I swapped the "Tenant Network Type" 3 times, same result... But I still did not removed the br-ex from the scene.<br>
>>><br>
>>><br>
>>>If someone wants to debug it, I can give the root password, no problem, it is just a lab... =)<br>
>>><br>
>>><br>
>>>Thanks!<br>
>>>Thiago<br>
>>><br>
>>><br>
>>>On 25 October 2013 19:45, Rick Jones <<a href="mailto:rick.jones2@hp.com" target="_blank">rick.jones2@hp.com</a>> wrote:<br>
>>><br>
>>>On 10/25/2013 02:37 PM, Martinx - $B%8%'!<%`%:(B wrote:<br>
>>>><br>
>>>>WOW!! Thank you for your time Rick! Awesome answer!! =D<br>
>>>>><br>
>>>>>I'll do this tests (with ethtool GRO / CKO) tonight but, do you think<br>
>>>>>that this is the main root of the problem?!<br>
>>>>><br>
>>>>><br>
>>>>>I mean, I'm seeing two distinct problems here:<br>
>>>>><br>
>>>>>1- Slow connectivity to the External network plus SSH lags all over the<br>
>>>>>cloud (everything that pass trough L3 / Namespace is problematic), and;<br>
>>>>><br>
>>>>>2- Communication between two Instances on different hypervisors (i.e.<br>
>>>>>maybe it is related to this GRO / CKO thing).<br>
>>>>><br>
>>>>><br>
>>>>>So, two different problems, right?!<br>
>>>>><br>
>>>><br>
One or two problems I cannot say. Certainly if one got the benefit of stateless offloads in one direction and not the other, one could see different performance limits in each direction.<br>
>>>><br>
>>>>All I can really say is I liked it better when we were called Quantum, because then I could refer to it as "Spooky networking at a distance." Sadly, describing Neutron as "Networking with no inherent charge" doesn't work as well :)<br>
>>>><br>
>>>>rick jones<br>
>>>><br>
>>>><br>
>>><br>
>>><br>
>>><br>
><br>
><br>
><br>
</div></div></blockquote></div><br></div>
</div></div></blockquote></div><br></div></div>