[Openstack-operators] [Openstack] Bad performance on physical hosts kvm + bonding + bridging

Rick Jones rick.jones2 at hp.com
Fri Jul 13 16:58:51 UTC 2012


On 07/13/2012 06:55 AM, Leandro Reox wrote:
> Ok, here is the story, we deployed some inhouse APIs in our Openstack 
> privade cloud, and we were stressing them up, we realize that some 
> packages were taking so long, to discard the behavior of the api, we 
> installed apache, lighttpd and event tried with netcat, of course on 
> the guest systems running ubuntu 10.10 w/virtio, after getting nuts 
> modifing sysctl parameters to change the guest behavior, we realized 
> that if we installed apache, or lighttpd on the PHYSICAL host the 
> behavior was the same ...., that surprised us, when we try the same 
> benchmark on a node without bonding, bridging and without any KVM 
> packages or nova installed, with the same HW specs, the benchmark 
> passes OK, but if we run the same tests on a spare nova node with 
> everything installed + bonding + bridging that never run a virtual 
> guest machine, the test fails too, so, so far:
>
> Tested on hosts with Ubuntu 10.10, 11.10 and 12.04
>
> - Clean node without bonding + briding or KVM - just the eth0 
> configured - PASS
> - Spare node with bridging - PASS
> - Spare node with just bonding (dynamic link aggr mode4) - PASS
> - Spare node with nova + kvm + bonding + bridging - FAILS
> - Spare node with nova + kvm - PASS
>
> Is there a chance that working with bridging + bonding + nova some 
> module get screwed, ill attach the tests , you can see that a small 
> amount of packages takes TOO LONG, like 3secs, and the overhead time 
> is on the "CONNECT" phase

If I recall correctly, 3 seconds is the default, initial TCP 
retransmission timeout (at least in older kernels - what is your load 
generator running?).  Between that, and your mentioning connect phase, 
my first guess (it is only a guess) would be that something is causing 
TCP SYNchronize segments to be dropped.  If that is the case, it should 
show-up in netstat -s statistics.  Snap them on both client and server 
before the test is started, and after the test is completed, and then 
run them through something like beforeafter ( 
ftp://ftp.cup.hp.com/dist/networking/tools )

netstat -s > before.server
# run benchmark
netstat -s > after.server
beforeafter before.server after.server > delta.server
less delta.server

(As a sanity check, make certain that before.server and after.server 
have the same number of lines. The habit of Linux's netstat to avoid 
printing a statistic with a value of zero can, sometimes, confuse 
beforeafter if a stat appears in after that was not present in before.)

It might not be a bad idea to include ethtool -S statistics from each of 
the interfaces in that procedure as well.

rick jones
probably a good idea to mention the bonding mode you are using

> This is ApacheBench, Version 2.3 <$Revision: 655654 $>
> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
> Licensed to The Apache Software Foundation, http://www.apache.org/
>
> Benchmarking 172.16.161.25 (be patient)
> Completed 2500 requests
> Completed 5000 requests
> Completed 7500 requests
> Completed 10000 requests
> Completed 12500 requests
> Completed 15000 requests
> Completed 17500 requests
> Completed 20000 requests
> Completed 22500 requests
> Completed 25000 requests
> Finished 25000 requests
>
>
> Server Software:        Apache/2.2.16
> Server Hostname:        172.16.161.25
> Server Port:            80
>
> Document Path:          /
> Document Length:        177 bytes
>
> Concurrency Level:      5
> Time taken for tests:   7.493 seconds
> Complete requests:      25000
> Failed requests:        0
> Write errors:           0
> Total transferred:      11350000 bytes
> HTML transferred:       4425000 bytes
> Requests per second:    3336.53 [#/sec] (mean)
> Time per request:       1.499 [ms] (mean)
> Time per request:       0.300 [ms] (mean, across all concurrent requests)
> Transfer rate:          1479.28 [Kbytes/sec] received
>
> Connection Times (ms)
>              min  mean[+/-sd] median   max
> Connect:        0    1  46.6      0    3009
> Processing:     0    1   5.7      0     277
> Waiting:        0    0   4.6      0     277
> Total:          0    1  46.9      1    3010
>
> Percentage of the requests served within a certain time (ms)
>  50%      1
>  66%      1
>  75%      1
>  80%      1
>  90%      1
>  95%      1
>  98%      1
>  99%      1
> 100%   3010 (longest request)
>
> Regards!
>
>
>


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-operators/attachments/20120713/115a455d/attachment.html>


More information about the Openstack-operators mailing list