Open Stack

Thu Sep 11 19:11:45 UTC 2014

On Wed, Sep 10, 2014 at 6:09 PM, Kurt Griffiths
<kurt.griffiths at rackspace.com> wrote:
> On 9/10/14, 3:58 PM, "Devananda van der Veen" <devananda.vdv at gmail.com>
> wrote:
>
>>I'm going to assume that, for these benchmarks, you configured all the
>>services optimally.
>
> Sorry for any confusion; I am not trying to hide anything about the setup.
> I thought I was pretty transparent about the way uWSGI, MongoDB, and Redis
> were configured. I tried to stick to mostly default settings to keep
> things simple, making it easier for others to reproduce/verify the results.
>
> Is there further information about the setup that you were curious about
> that I could provide? Was there a particular optimization that you didn’t
> see that you would recommend?
>

Nope.

>>I'm not going to question why you didn't run tests
>>with tens or hundreds of concurrent clients,
>
> If you review the different tests, you will note that a couple of them
> used at least 100 workers. That being said, I think we ought to try higher
> loads in future rounds of testing.
>

Perhaps I misunderstand what "2 processes with 25 gevent workers"
means - I think this means you have two _processes_ which are using
greenthreads and eventlet, and so each of those two python processes
is swapping between 25 coroutines. From a load generation standpoint,
this is not the same as having 100 concurrent client _processes_.

>>or why you only ran the
>>tests for 10 seconds.
>
> In Round 1 I did mention that i wanted to do a followup with a longer
> duration. However, as I alluded to in the preamble for Round 2, I kept
> things the same for the redis tests to compare with the mongo ones done
> previously.
>
> We’ll increase the duration in the next round of testing.
>

Sure - consistency between tests is good. But I don't believe that a
10-second benchmark is ever enough to suss out service performance.
Lots of things only appear after high load has been applied for a
period of time as eg. caches fill up, though this leads to my next
point below...

>>Instead, I'm actually going to question how it is that, even with
>>relatively beefy dedicated hardware (128 GB RAM in your storage
>>nodes), Zaqar peaked at around 1,200 messages per second.
>
> I went back and ran some of the tests and never saw memory go over ~20M
> (as observed with redis-top) so these same results should be obtainable on
> a box with a lot less RAM.

Whoa. So, that's a *really* important piece of information which was,
afaict, missing from your previous email(s). I hope you can understand
how, with the information you provided ("the Redis server has 128GB
RAM") I was shocked at the low performance.

> Furthermore, the tests only used 1 CPU on the
> Redis host, so again, similar results should be achievable on a much more
> modest box.

You described fairy beefy hardware but didn't utilize it fully -- I
was expecting your performance test to attempt to stress the various
components of a Zaqar installation and, at least in some way, attempt
to demonstrate what the capacity of a Zaqar deployment might be on the
hardware you have available. Thus my surprise at the low numbers. If
that wasn't your intent (and given the CPU/RAM usage your tests
achieved, it's not what you achieved) then my disappointment in those
performance numbers is unfounded.

But I hope you can understand, if I'm looking at a service benchmark
to gauge how well that service might perform in production, seeing
expensive hardware perform disappointingly slowly is not a good sign.

>
> FWIW, I went back and ran a couple scenarios to get some more data points.
> First, I did one with 50 producers and 50 observers. In that case, the
> single CPU on which the OS scheduled the Redis process peaked at 30%. The
> second test I did was with 50 producers + 5 observers + 50 consumers
> (which claim messages and delete them rather than simply page through
> them). This time, Redis used 78% of its CPU. I suppose this should not be
> surprising because the consumers do a lot more work than the observers.
> Meanwhile, load on the web head was fairly high; around 80% for all 20
> CPUs. This tells me that python and/or uWSGI are working pretty hard to
> serve these requests, and there may be some opportunities to optimize that
> layer. I suspect there are also some opportunities to reduce the number of
> Redis operations and roundtrips required to claim a batch of messages.
>

OK - those resource usages sound better. At least you generated enough
load to saturate the uWSGI process CPU, which is a good point to look
at performance of the system.

At that peak, what was the:
- average msgs/sec
- min/max/avg/stdev time to [post|get|delete] a message

> The other thing to consider is that in these first two rounds I did not
> test increasing amounts of load (number of clients performing concurrent
> requests) and graph that against latency and throughput. Out of curiosity,
> I just now did a quick test to compare the messages enqueued with 50
> producers + 5 observers + 50 consumers vs. adding another 50 producer
> clients and found that the producers were able to post 2,181 messages per
> second while giving up only 0.3 ms.

Is that 2,181 msg/sec total, or per-producer?

I'd really like to see the total throughput and latency graphed as #
of clients increases. Or if graphing isn't your thing, even just post
a .csv of the raw numbers and I will be happy to graph it.

It would also be great to see how that scales as you add more Redis
instances until all the available CPU cores on your Redis host are in
use.

Thanks for clarifying this.

-Devananda

Open Stack

[openstack-dev] [zaqar] Juno Performance Testing (Round 2)

OpenStack

Community

Documentation

Branding & Legal