[openstack-dev] [nova][neutron][upgrade] Grenade multinode partial upgrade

Ihar Hrachyshka ihrachys at redhat.com
Wed Feb 10 17:38:24 UTC 2016


Sean M. Collins <sean at coreitpro.com> wrote:

> Ihar Hrachyshka wrote:
>> UPD: seems like enforcing instance mtu to 1400 indeed makes us pass  
>> forward
>> into tempest:
>>
>> http://logs.openstack.org/59/265759/3/experimental/gate-grenade-dsvm-neutron-multinode/a167a59/console.html
>>
>> And there are only three failures there:
>>
>> http://logs.openstack.org/59/265759/3/experimental/gate-grenade-dsvm-neutron-multinode/a167a59/console.html#_2016-01-11_11_58_47_945
>>
>> I also don’t see any RPC versioning related traces in service logs,  
>> which is
>> a good sign.
>
> Just an update - we are still stuck on those three tempest tests.
>
> I was able to dig a bit and it looks like it's still an MTU issue.
>
>
> http://logs.openstack.org/35/187235/14/experimental/gate-grenade-dsvm-neutron-multinode/c5eda62/logs/tempest.txt.gz#_2016-02-09_20_37_40_044
>
> "SSHException: Error reading SSH protocol banner[Errno 104] Connection  
> reset by peer”

Note that this time we get reset immediately instead of being stuck there  
until timeout.

>
> I tried pushing down a patch to cram network_device_mtu down to 1450 in
> the hopes that it would do the trick - but sadly that didn't fix. I’m

Actually, we already have 1450 for network_device_mtu for the job since:

https://review.openstack.org/#/c/267847/4/devstack-vm-gate.sh

Also, I added some interface state dump for worlddump, and here is how the  
main node networking setup looks like:

http://logs.openstack.org/59/265759/20/experimental/gate-grenade-dsvm-neutron-multinode/d64a6e6/logs/worlddump-2016-01-30-164508.txt.gz

br-ex: mtu = 1450
inside router: qg mtu = 1450, qr = 1450

So should be fine in this regard. I also set devstack locally enforcing  
network_device_mtu, and it seems to pass packets of 1450 size through. So  
it’s probably something tunneling packets to the subnode that fails for us,  
not local router-to-tap bits.

I also see br-tun having 1500. Is it a problem? Probably not, but I admit I  
miss a lot in this topic so far.

Also I see some qg-2c68fb65-21 device in the worlddump output from above in  
global namespace. The device has mtu = 1500. Which router does the device  
belong to?..


> going to have to keep digging. I am almost certain it's something that
> Matt K (Sam-I-Am) has already made note of in his research.

Actually, I don’t think Matt ran any tests for MTU that is reduced  
comparing to ‘standard’ 1500 size. It would be interesting to see how it  
goes in his lab with the limited mtu size we use in gate.

Ihar



More information about the OpenStack-dev mailing list