[Openstack-operators] [Openstack] Bad performance on physical hosts kvm + bonding + bridging
Rick Jones
rick.jones2 at hp.com
Fri Jul 13 16:58:51 UTC 2012
On 07/13/2012 06:55 AM, Leandro Reox wrote:
> Ok, here is the story, we deployed some inhouse APIs in our Openstack
> privade cloud, and we were stressing them up, we realize that some
> packages were taking so long, to discard the behavior of the api, we
> installed apache, lighttpd and event tried with netcat, of course on
> the guest systems running ubuntu 10.10 w/virtio, after getting nuts
> modifing sysctl parameters to change the guest behavior, we realized
> that if we installed apache, or lighttpd on the PHYSICAL host the
> behavior was the same ...., that surprised us, when we try the same
> benchmark on a node without bonding, bridging and without any KVM
> packages or nova installed, with the same HW specs, the benchmark
> passes OK, but if we run the same tests on a spare nova node with
> everything installed + bonding + bridging that never run a virtual
> guest machine, the test fails too, so, so far:
>
> Tested on hosts with Ubuntu 10.10, 11.10 and 12.04
>
> - Clean node without bonding + briding or KVM - just the eth0
> configured - PASS
> - Spare node with bridging - PASS
> - Spare node with just bonding (dynamic link aggr mode4) - PASS
> - Spare node with nova + kvm + bonding + bridging - FAILS
> - Spare node with nova + kvm - PASS
>
> Is there a chance that working with bridging + bonding + nova some
> module get screwed, ill attach the tests , you can see that a small
> amount of packages takes TOO LONG, like 3secs, and the overhead time
> is on the "CONNECT" phase
If I recall correctly, 3 seconds is the default, initial TCP
retransmission timeout (at least in older kernels - what is your load
generator running?). Between that, and your mentioning connect phase,
my first guess (it is only a guess) would be that something is causing
TCP SYNchronize segments to be dropped. If that is the case, it should
show-up in netstat -s statistics. Snap them on both client and server
before the test is started, and after the test is completed, and then
run them through something like beforeafter (
ftp://ftp.cup.hp.com/dist/networking/tools )
netstat -s > before.server
# run benchmark
netstat -s > after.server
beforeafter before.server after.server > delta.server
less delta.server
(As a sanity check, make certain that before.server and after.server
have the same number of lines. The habit of Linux's netstat to avoid
printing a statistic with a value of zero can, sometimes, confuse
beforeafter if a stat appears in after that was not present in before.)
It might not be a bad idea to include ethtool -S statistics from each of
the interfaces in that procedure as well.
rick jones
probably a good idea to mention the bonding mode you are using
> This is ApacheBench, Version 2.3 <$Revision: 655654 $>
> Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
> Licensed to The Apache Software Foundation, http://www.apache.org/
>
> Benchmarking 172.16.161.25 (be patient)
> Completed 2500 requests
> Completed 5000 requests
> Completed 7500 requests
> Completed 10000 requests
> Completed 12500 requests
> Completed 15000 requests
> Completed 17500 requests
> Completed 20000 requests
> Completed 22500 requests
> Completed 25000 requests
> Finished 25000 requests
>
>
> Server Software: Apache/2.2.16
> Server Hostname: 172.16.161.25
> Server Port: 80
>
> Document Path: /
> Document Length: 177 bytes
>
> Concurrency Level: 5
> Time taken for tests: 7.493 seconds
> Complete requests: 25000
> Failed requests: 0
> Write errors: 0
> Total transferred: 11350000 bytes
> HTML transferred: 4425000 bytes
> Requests per second: 3336.53 [#/sec] (mean)
> Time per request: 1.499 [ms] (mean)
> Time per request: 0.300 [ms] (mean, across all concurrent requests)
> Transfer rate: 1479.28 [Kbytes/sec] received
>
> Connection Times (ms)
> min mean[+/-sd] median max
> Connect: 0 1 46.6 0 3009
> Processing: 0 1 5.7 0 277
> Waiting: 0 0 4.6 0 277
> Total: 0 1 46.9 1 3010
>
> Percentage of the requests served within a certain time (ms)
> 50% 1
> 66% 1
> 75% 1
> 80% 1
> 90% 1
> 95% 1
> 98% 1
> 99% 1
> 100% 3010 (longest request)
>
> Regards!
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-operators/attachments/20120713/115a455d/attachment.html>
More information about the Openstack-operators
mailing list