[openstack-dev] [neutron] Neutron router and nf_conntrack performance problems
Brian Haley
brian.haley at hp.com
Mon Aug 18 16:57:27 UTC 2014
Stuart,
I also can't say I've seen this, but I am curious now. I did have a few
questions for you though.
1. When you say you set nf_conntrack_max/nf_conntrack_hash to 256k, did you
really set the hash size that large? Typically the hash is 1/8 of the max,
meaning you'd have 8 entries per hashbucket.
2. Does /sys/module/nf_conntrack/parameters/hashsize look correct?
3. Are you seeing any messages such as "nf_conntrack: table full, dropping packet"
4. How many entries are the in the conntrack table? 'sudo conntrack -C'
5. Have you been able to drill down any further into what's taking all the time
in nf_conntrack_tuple_taken() ? I can't imagine you have a single bucket with
tons of entries and you're spinning looking at each, but it could be that simple.
Thanks,
-Brian
On 08/16/2014 12:12 PM, Stuart Fox wrote:
> Hey neutron dev!
>
> Im having a serious problem with my neutron router getting spin locked in
> nf_conntrack_tuple_taken.
> Has anybody else experienced it?
> "perf top" shows nf_conntrack_tuple_taken at 75%
> As the incoming request rate goes up, so nf_conntrack_tuple_taken runs very hot
> on CPU0 causing ksoftirqd/0 to run at 100%. At that point internal pings on the
> GRE network go sky high and its game over. Pinging from a vm to the subnet
> default gateway on the neutron goes from 0.2ms to 11s! pinging from the same vm
> to another vm in the same subnet stays constant at 0.2ms.
>
> Very much indicates to me that the neutron router is having serious problems.
> No other part of the system seems under pressure.
>
> ipv6 is disabled, and nf_conntrack_max/nf_conntrack_hash are set to 256k.
> We've tried the default 3.13 and the utopic 3.16 kernel (3.16 has lots of work
> on removing spinlocks around nf_conntrack). 3.16 survives a little longer but
> still gets in the same state
>
> Neutron router
> 1 x Ubuntu 14.04/Icehouse 2014.1.1 on an ibm x3550 with 4 10G intel nics.
> eth0 - Mgt
> eth1 - GRE
> eth2 - Public
> eth3 - unused
>
> Compute/controller nodes
> 43 x Ubuntu 14.04/Icehouse 2014.1.1 ibm x240 flex blades with 4 emulex nics
> eth0 Mgt
> eth2 GRE
>
> Any help very much appreciated!
> Replace the l2/l3 functions with hardware is very much an option if thats a
> better solution.
> Im running out of time before my client decides to stay on AWS.
>
>
>
> BR,
> Stuart
>
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
More information about the OpenStack-dev
mailing list