[openstack-dev] [zaqar] Juno Performance Testing (Round 2)

Joe Gordon joe.gordon0 at gmail.com
Fri Sep 12 18:45:49 UTC 2014


On Tue, Sep 9, 2014 at 12:19 PM, Kurt Griffiths <
kurt.griffiths at rackspace.com> wrote:

> Hi folks,
>
> In this second round of performance testing, I benchmarked the new Redis
> driver. I used the same setup and tests as in Round 1 to make it easier to
> compare the two drivers. I did not test Redis in master-slave mode, but
> that likely would not make a significant difference in the results since
> Redis replication is asynchronous[1].
>
> As always, the usual benchmarking disclaimers apply (i.e., take these
> numbers with a grain of salt; they are only intended to provide a ballpark
> reference; you should perform your own tests, simulating your specific
> scenarios and using your own hardware; etc.).
>
> ## Setup ##
>
> Rather than VMs, I provisioned some Rackspace OnMetal[3] servers to
> mitigate noisy neighbor when running the performance tests:
>
> * 1x Load Generator
>     * Hardware
>         * 1x Intel Xeon E5-2680 v2 2.8Ghz
>         * 32 GB RAM
>         * 10Gbps NIC
>         * 32GB SATADOM
>     * Software
>         * Debian Wheezy
>         * Python 2.7.3
>         * zaqar-bench
> * 1x Web Head
>     * Hardware
>         * 1x Intel Xeon E5-2680 v2 2.8Ghz
>         * 32 GB RAM
>         * 10Gbps NIC
>         * 32GB SATADOM
>     * Software
>         * Debian Wheezy
>         * Python 2.7.3
>         * zaqar server
>             * storage=mongodb
>             * partitions=4
>             * MongoDB URI configured with w=majority
>         * uWSGI + gevent
>             * config: http://paste.openstack.org/show/100592/
>             * app.py: http://paste.openstack.org/show/100593/
> * 3x MongoDB Nodes
>     * Hardware
>         * 2x Intel Xeon E5-2680 v2 2.8Ghz
>         * 128 GB RAM
>         * 10Gbps NIC
>         * 2x LSI Nytro WarpDrive BLP4-1600[2]
>     * Software
>         * Debian Wheezy
>         * mongod 2.6.4
>             * Default config, except setting replSet and enabling periodic
>               logging of CPU and I/O
>             * Journaling enabled
>             * Profiling on message DBs enabled for requests over 10ms
> * 1x Redis Node
>     * Hardware
>         * 2x Intel Xeon E5-2680 v2 2.8Ghz
>         * 128 GB RAM
>         * 10Gbps NIC
>         * 2x LSI Nytro WarpDrive BLP4-1600[2]
>     * Software
>         * Debian Wheezy
>         * Redis 2.4.14
>             * Default config (snapshotting and AOF enabled)
>             * One process
>
> As in Round 1, Keystone auth is disabled and requests go over HTTP, not
> HTTPS. The latency introduced by enabling these is outside the control of
> Zaqar, but should be quite minimal (speaking anecdotally, I would expect
> an additional 1-3ms for cached tokens and assuming an optimized TLS
> termination setup).
>
> For generating the load, I again used the zaqar-bench tool. I would like
> to see the team complete a large-scale Tsung test as well (including a
> full HA deployment with Keystone and HTTPS enabled), but decided not to
> wait for that before publishing the results for the Redis driver using
> zaqar-bench.
>
> CPU usage on the Redis node peaked at around 75% for the one process. To
> better utilize the hardware, a production deployment would need to run
> multiple Redis processes and use Zaqar's backend pooling feature to
> distribute queues across the various instances.
>
> Several different messaging patterns were tested, taking inspiration
> from: https://wiki.openstack.org/wiki/Use_Cases_(Zaqar)
>
> Each test was executed three times and the best time recorded.
>
> A ~1K sample message (1398 bytes) was used for all tests.
>
> ## Results ##
>
> ### Event Broadcasting (Read-Heavy) ###
>
> OK, so let's say you have a somewhat low-volume source, but tons of event
> observers. In this case, the observers easily outpace the producer, making
> this a read-heavy workload.
>
> Options
>     * 1 producer process with 5 gevent workers
>         * 1 message posted per request
>     * 2 observer processes with 25 gevent workers each
>         * 5 messages listed per request by the observers
>     * Load distributed across 4[6] queues
>     * 10-second duration
>

10 seconds is way too short


>
> Results
>     * Redis
>         * Producer: 1.7 ms/req,  585 req/sec
>         * Observer: 1.5 ms/req, 1254 req/sec
>     * Mongo
>         * Producer: 2.2 ms/req,  454 req/sec
>         * Observer: 1.5 ms/req, 1224 req/sec


If zaqar is like amazon SQS, then the latency for a single message and the
throughput for a single tenant is not important. I wouldn't expect anyone
who has latency sensitive work loads or needs massive throughput to use
zaqar, as these people wouldn't use SQS either. The consistency of the
latency (shouldn't change under load) and zaqar's ability to scale
horizontally mater much more. What I would be great to see some other
things benchmarked instead:

* graph latency versus number of concurrent active tenants
* graph latency versus message size
* How throughput scales as you scale up the number of assorted zaqar
components. If one of the benefits of zaqar is its horizontal scalability,
lets see it.
* How does this change with message batching?


> ### Event Broadcasting (Balanced) ###
>
> This test uses the same number of producers and consumers, but note that
> the observers are still listing (up to) 5 messages at a time[4], so they
> still outpace the producers, but not as quickly as before.
>
> Options
>     * 2 producer processes with 25 gevent workers each
>         * 1 message posted per request
>     * 2 observer processes with 25 gevent workers each
>         * 5 messages listed per request by the observers
>     * Load distributed across 4 queues
>     * 10-second duration
>
> Results
>     * Redis
>         * Producer: 1.4 ms/req, 1374 req/sec
>         * Observer: 1.6 ms/req, 1178 req/sec
>     * Mongo
>         * Producer: 2.2 ms/req, 883 req/sec
>         * Observer: 2.8 ms/req, 348 req/sec
>
> ### Point-to-Point Messaging ###
>
> In this scenario I simulated one client sending messages directly to a
> different client. Only one queue is required in this case[5].
>
> Options
>     * 1 producer process with 1 gevent worker
>         * 1 message posted per request
>     * 1 observer process with 1 gevent worker
>         * 1 message listed per request
>     * All load sent to a single queue
>     * 10-second duration
>
> Results
>     * Redis
>         * Producer: 2.9 ms/req, 345 req/sec
>         * Observer: 2.9 ms/req, 339 req/sec
>     * Mongo
>         * Producer: 5.5 ms/req, 179 req/sec
>         * Observer: 3.5 ms/req, 278 req/sec
>
> ### Task Distribution ###
>
> This test uses several producers and consumers in order to simulate
> distributing tasks to a worker pool. In contrast to the observer worker
> type, consumers claim and delete messages in such a way that each message
> is processed once and only once.
>
> Options
>     * 2 producer processes with 25 gevent workers
>         * 1 message posted per request
>     * 2 consumer processes with 25 gevent workers
>         * 5 messages claimed per request, then deleted one by one before
>           claiming the next batch of messages
>     * Load distributed across 4 queues
>     * 10-second duration
>
> Results
>     * Redis
>         * Producer: 1.5 ms/req, 1280 req/sec
>         * Consumer
>             * Claim: 6.9 ms/req
>             * Delete: 1.5 ms/req
>             * 1257 req/sec (overall)
>
>     * Mongo
>         * Producer: 2.5 ms/req, 798 req/sec
>         * Consumer
>             * Claim: 8.4 ms/req
>             * Delete: 2.5 ms/req
>             * 813 req/sec (overall)
>
> ### Auditing / Diagnostics ###
>
> This test is the same as performed in Task Distribution, but also adds a
> few observers to the mix.
>
> When testing the Redis driver, I varied whether or not keep-alive was
> enabled in the uWSGI config. The impact on performance was negligble,
> perhaps due to the speed of the test network and the fact that TLS is not
> being used in these tests.
>
> Options
>     * 2 producer processes with 25 gevent workers each
>         * 1 message posted per request
>     * 2 consumer processes with 25 gevent workers each
>         * 5 messages claimed per request, then deleted one by one before
>           claiming the next batch of messages
>     * 1 observer process with 5 gevent workers
>         * 5 messages listed per request
>     * Load distributed across 4 queues
>     * 10-second duration
>
> Results
>     * Redis (Keep-Alive)
>         * Producer: 1.6 ms/req, 1275 req/sec
>         * Consumer
>             * Claim: 7.0 ms/req
>             * Delete: 1.5 ms/req
>             * 1217 req/sec (overall)
>         * Observer: 3.5 ms/req, 282 req/sec
>     * Redis (No Keep-Alive)
>         * Producer: 1.6 ms/req, 1255 req/sec
>         * Consumer
>             * Claim: 7.0 ms/req
>             * Delete: 1.6 ms/req
>             * 1202 req/sec (overall)
>         * Observer: 3.4 ms/req, 281 req/sec
>     * Mongo (Keep-Alive)
>         * Producer: 2.2 ms/req, 878 req/sec
>         * Consumer
>             * Claim: 8.2 ms/req
>             * Delete: 2.3 ms/req
>             * 876 req/sec (overall)
>         * Observer: 7.4 ms/req, 133 req/sec
>
> ## Thoughts ##
>
> The Redis driver appears to have a significant performance advantage over
> the MongoDB driver. Further gains in performance could no doubt be made by
> taking advantage of Redis' Lua-based scripting ability, which is not
> currently used in the driver.
>
> In any case, both NoSQL[7] servers performed admirably in these tests. I
> suspect that we will be able to achieve the project's goals without having
> to resort to something more low-level (e.g., LevelDB).
>
> Finally, I just wanted to note that as I've been using the zaqar-benchmark
> tool, I've found it useful for smoke testing in preparation for the Juno
> RC, as it exercises several common messaging patterns, making use of all
> of Zaqar's semantics, and does so with lots of concurrent requests, which
> helps ensure the service doesn't have any subtle race conditions.
>
> --Kurt
>
>
> ===========
> [1]: IMO, it would be great to see Redis implement optional support for
> majority-based ACK ala MongoDB.
> [2]: Yes, I know that's some crazy IOPS, but there is plenty of RAM to
> avoid paging, so you should be able to get similar results with some
> regular disks, assuming they are decent enough to support enabling
> journaling (if you need that level of durability).
> [3]: One might argue that the only thing these performance tests show
> is that *OnMetal* is fast. However, as I have pointed out, there was plenty
> of headroom left on these servers during the tests, so similar results
> should be achievable using more modest hardware.
> [4]: In a real app, messages will usually be requested in batches.
> [5]: In this test, the target client does not send a response message back
> to the sender. However, if it did, the test would still only require a
> single queue, since in Zaqar queues are duplex (although this can be
> disabled by setting echo=True in the query string when listing messages).
> [6]: Chosen somewhat arbitrarily.
>
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20140912/6e258e4c/attachment.html>


More information about the OpenStack-dev mailing list