[Openstack] [QUANTUM] (Bug ?) L3 routing not correctly fragmenting packets ?

Sylvain Bauza sylvain.bauza at digimind.com
Mon Mar 11 13:09:32 UTC 2013


Okay. I think I got the reason why it's not working with OVS/GRE 
contrary to FlatDHCP nova-network.
So, as per 
http://www.cisco.com/en/US/tech/tk827/tk369/technologies_white_paper09186a00800d6979.shtml 
,
GRE encapsulation protocol can add up to 34 bytes to the IP datagram 
(meaning the TCP segment is only 1456 bytes if MTU set to 1500).
When the packet is about 1500 bytes, then it should fragment to keep the 
1500-byte size of the reply (including GRE encap then).

Unfortunaly, due to security purpose, the ICMP packet "type 3/code 4" 
(frag. needed) can't be reached to the X.X.X.X backend as this backend 
is denying any ICMP request (firewall).
As a consequence, PathMTU is failing and packets still retransmited with 
1500-byte size again and again...

As said on my first post, the only workaround I found is to modify *all* 
my VMs with MTU set to 1454 (don't know why there is a 2-bytes overhead 
compared to the 1456-byte I told above), including my Windows VMs which 
is not a cool stuff (modifying a registry key and reboot the VM. Yes, 
you aren't dreaming. This is the way for Windows-based machines to 
modify MTUs...)

Do you know if any cool idea would prevent to modify VMs, and only do 
things on the network node ?

My TCP/IP knowledge is quite at its limits, so any idea is great for me.

Thanks,
-Sylvain


(BTW, maybe my explanation is absolutely wrong, and GRE is not 
responsible of the 36-byte overhead. If yes, please accept my apologies, 
any other clarification would be great).



Le 11/03/2013 10:07, Sylvain Bauza a écrit :
> I also forgot to mention: I'm using a typical Openvswitch setup with 
> GRE encapsulation.
> I can't proof, but would GRE not able to work with PathMTU ?
>
> -Sylvain
>
> Le 11/03/2013 09:40, Sylvain Bauza a écrit :
>> Hi Rick, reply inline.
>>
>> Le 08/03/2013 20:27, Rick Jones a écrit :
>>> On 03/08/2013 09:55 AM, Aaron Rosen wrote:
>>>> Hi Sylvain,
>>>>
>>>>
>>>> This seems very odd to me. The reason this should happen is if your
>>>> client is sending packets with the DF (don't fragment) bit set in the
>>>> TCP header of the packets you are sending. I'd confirm that your
>>>> version of 'curl' is doing this (which it should definitely not do!).
>>>
>>> Why shouldn't a TCP connection initiated by curl (or anything else) 
>>> have Path MTU discovery enabled? (ie the DF bit set in the IP 
>>> datagrams carrying the TCP segments)
>>>
>>
>> [SBA] Thanks for the explanation of the DF flag
>>>> What should happen is the router should fragment the packets for you
>>>> and if a fragment is lost TCP will just re-transmit the full packet
>>>> again and things should eventually work....
>>>
>>> Here I thought all the IETF demigods considered IP Fragmentation 'To 
>>> Be Avoided (tm)' - hence the creation of Path MTU discovery in the 
>>> first place. :)
>>>
>>> FWIW, in the IPv6 world, routers do not fragment.  That implies 
>>> either functioning PathMTU discovery, or lowest common MTU...
>>>
>>>>
>>>> Aaron
>>>>
>>>>
>>>> On Fri, Mar 8, 2013 at 9:08 AM, Sylvain Bauza
>>>> <sylvain.bauza at digimind.com> wrote:
>>>>> Hi,
>>>>>
>>>>> I recently observed a strange behaviour with L3 Quantum routing 
>>>>> (Openvswitch
>>>>> setup with Provider Router). A simple curl to an external website is
>>>>> sometimes failing due to packet size  :
>>>>>
>>>>>      192.168.10.3 > X.X.X.X: ICMP 192.168.10.3 unreachable - need 
>>>>> to frag
>>>>> (mtu 1454), length 556
>>>>>      IP (tos 0x0, ttl 48, id 25918, offset 0, flags [DF], proto 
>>>>> TCP (6),
>>>>> length 1500)
>>>
>>> Why is the ICMP Destination Unreachable datagram being sent back so 
>>> large?  I would have expected it to be rather smaller - an Ethernet, 
>>> IP and ICMP header, and then the original IP header and something 
>>> like 8 bytes or so of the original IP datagram's payload.
>>>
>>> I take it that ICMP is not getting back to the original sender? Or 
>>> is being ignored?
>>>
>>>
>>
>> [SBA] I take the point. That means that PathMTU is not working for my 
>> Quantum installation. I also had a Nova-network (FlatDHCP mode) and I 
>> didn't noticed the issue. So, I assume something is wrong with my 
>> config.
>>
>>
>>>>>
>>>>> Only changing the VM MTU to 1454 does the trick ('ifconfig eth0 
>>>>> mtu 1454').
>>>>>
>>>>> For info, 192.168.10.3 is the floating IP bound to 10.0.0.4 
>>>>> (private IP).
>>>
>>> I suppose if 10.0.0.4 doesn't explicitly know about 192.168.10.3 it 
>>> might indeed ignore the ICMP message. Assuming it isn't getting 
>>> un-NATted on the way back.
>>>
>>
>> [SBA] This *is* un-NAT'd on the way back. By tcpdump'ing with the '-i 
>> any' interface, I can see the DNAT mapping on the way back :
>>
>>
>> Do you have any idea on what I should fix (or at least workaround) to 
>> have PathMTU working ?
>> By the way, I did check and both client (10.0.0.4) and server 
>> (X.X.X.X) have MTU set to 1500. I can't understand why the server is 
>> asking for a fragment size of 1454.
>>
>> Thanks,
>> -Sylvain
>>
>





More information about the Openstack mailing list