[Openstack-operators] [Openstack] Extreme network throughput tuning with KVM as hypervisor
Alejandro Comisario
alejandro.comisario at mercadolibre.com
Fri Jan 17 17:07:21 UTC 2014
Well, i have news.
On the compute nodes with 128GB of RAM, 2x1Gb bonded interfaces, 200Mb/s
bandwidth and 10k packages/s going through each ethernet.
Leaving this sysctl setting on the host :
net.ipv4.tcp_max_tw_buckets = 3600000
net.ipv4.tcp_max_syn_backlog = 30000
net.core.netdev_max_backlog = 50000
net.core.somaxconn = 16384
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216
net.core.rmem_default = 16777216
net.core.wmem_default = 16777216
net.ipv4.tcp_congestion_control = cubic
net.ipv4.ip_local_port_range = 1024 65000
net.ipv4.tcp_fin_timeout = 5
net.ipv4.tcp_keepalive_time = 5
net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_tw_reuse = 1
vm.swappiness = 0
net.ipv4.tcp_syncookies = 1
net.ipv4.tcp_timestamps = 1
net.ipv4.tcp_max_orphans = 60000
net.ipv4.tcp_synack_retries = 3
net.ipv4.tcp_ecn=1
net.ipv4.tcp_sack=1
net.ipv4.tcp_dsack=1
net.ipv4.route.flush = 1
net.ipv6.route.flush = 1
net.ipv4.netfilter.ip_conntrack_udp_timeout = 30
net.ipv4.netfilter.ip_conntrack_tcp_timeout_close = 10
net.ipv4.netfilter.ip_conntrack_tcp_timeout_time_wait = 120
net.ipv4.netfilter.ip_conntrack_tcp_timeout_close_wait = 60
net.ipv4.netfilter.ip_conntrack_max = 1200000
net.ipv4.netfilter.ip_conntrack_tcp_timeout_established = 432000
net.ipv4.netfilter.ip_conntrack_tcp_timeout_syn_recv = 60
net.ipv4.netfilter.ip_conntrack_tcp_timeout_syn_sent = 120
net.ipv4.tcp_keepalive_time = 90
But setting both interfaces offload off (ethtool -k) and getting the ring
higher from 256 (ethtool -G eth[0-1] rx 1024 tx 1024) the 20ms delays
dissapeared from the HOST at least, we are now getting down to the VM to
see if we can see the same result.
BUT!!!!
On the compute nodes with 256GB of RAM, 2x1Gb bonded interfaces,
500/700Mb/s bandwidth and 30k packages/s going through each ethernet.
Leaving the same sysctl settings as above, offload turned off on both
interfaces, and tx/rx ring tested on 1024/2048, has showed no possitive
result as the datanodes with less RAM/throughput.
Maybe some advice about sysctl kernel buffers ? maybe something that worth
growing ?
Thank you all.
*Alejandro Comisario #melicloud CloudBuilders*
Arias 3751, Piso 7 (C1430CRG)
Ciudad de Buenos Aires - Argentina
Cel: +549(11) 15-3770-1857
Tel : +54(11) 4640-8443
On Thu, Jan 16, 2014 at 7:06 AM, George Shuklin <george.shuklin at gmail.com>wrote:
> Upgrade OVS to version >= 1.11.
>
> I don't know it will work with neutron or not, but OVS 1.10 (and 1.4, and
> any version <1.11) is just not production ready.
>
> Way to reproduce problem:
>
> hping3 --flood --rand-source ANY_FLOATING_IP.
>
> It kills any hosts with older OVS up to the level 'connection timeout'.
>
> On 11.01.2014 21:12, Alejandro Comisario wrote:
>
> Well, its been a long time since we use nova with KVM, we got over the
> many thousand vms, and still, something doesnt feel right.
> We are using ubuntu 12.04 kernel 3.2.0-[40-48], tuned sysctl with lots of
> parameters, and everything ... works, you can say, quite well.
>
> But here's the deal, we have an special networking scenario that is,
> EVERYTHING IS APIS, everything is throughput, no bandwidth.
> Every 2x1Gb bonded compute node, doesnt get over the [200Mb/s - 400Mb/s]
> but its handling hundreds of thousands requests per minute to the vms.
>
> And once in a while, gives you the sensation that everything goes to
> hell, timeouts from aplications over there, response times from apis going
> from 10ms to 200ms over there, 20ms delays happening between the vm ETH0
> and the VNET interface, etc.
> So, since its a massive scenario to tune, we never kinda, nailedon WHERE
> TO give this 1, 2 or 3 final buffer/ring/affinity tune to make everything
> work from the compute side.
>
> I know its a little awkward, but im craving, and jaunting for community
> real life examples regarding "HIGH THROUGHPUT" tuning with KVM scenarios,
> dark linux or if someone can help me go through configurations that might
> sound weird / unnecesary / incorrect.
>
> For those who are wondering, well ... i dont know what you have, lets
> start with this.
>
> COMPUTE NODES (99% of them, different vendors, but ...)
> * 128/256 GB of ram
> * 2 hexacores with HT enabled
> * 2x1Gb bonded interfaces (want to know the more than 20 models we are
> using, just ask for it)
> * Multi queue interfaces, pined via irq to different cores
> * ubuntu 12.04 kernel 3.2.0-[40-48]
> * Linux bridges, no VLAN, no open-vswitch
>
> I want to try to keep the networking appliances ( TOR's, AGGR, CORES )
> as out of the picture as possible.
> im thinking "i hope this thread gets great, in time"
>
> So, ready to learn as much as i can.
> Thank you openstack community, as allways.
>
> alejandrito
>
>
>
> _______________________________________________
> OpenStack-operators mailing listOpenStack-operators at lists.openstack.orghttp://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>
>
>
> _______________________________________________
> OpenStack-operators mailing list
> OpenStack-operators at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-operators/attachments/20140117/211ad31f/attachment.html>
More information about the OpenStack-operators
mailing list