[openstack-dev] [Zaqar] Zaqar and SQS Properties of Distributed Queues

Devananda van der Veen devananda.vdv at gmail.com
Thu Sep 18 17:16:12 UTC 2014


On Thu, Sep 18, 2014 at 8:54 AM, Devananda van der Veen
<devananda.vdv at gmail.com> wrote:
> On Thu, Sep 18, 2014 at 7:45 AM, Flavio Percoco <flavio at redhat.com> wrote:
>> On 09/18/2014 04:09 PM, Gordon Sim wrote:
>>> On 09/18/2014 12:31 PM, Flavio Percoco wrote:
>>>> Zaqar guarantees FIFO. To be more precise, it does that relying on the
>>>> storage backend ability to do so as well. Depending on the storage used,
>>>> guaranteeing FIFO may have some performance penalties.
>>>
>>> Would it be accurate to say that at present Zaqar does not use
>>> distributed queues, but holds all queue data in a storage mechanism of
>>> some form which may internally distribute that data among servers but
>>> provides Zaqar with a consistent data model of some form?
>>
>> I think this is accurate. The queue's distribution depends on the
>> storage ability to do so and deployers will be able to choose what
>> storage works best for them based on this as well. I'm not sure how
>> useful this separation is from a user perspective but I do see the
>> relevance when it comes to implementation details and deployments.
>
> Guaranteeing FIFO and not using a distributed queue architecture
> *above* the storage backend are both scale-limiting design choices.
> That Zaqar's scalability depends on the storage back end is not a
> desirable thing in a cloud-scale messaging system in my opinion,
> because this will prevent use at scales which can not be accommodated
> by a single storage back end.
>

It may be worth qualifying this a bit more.

While no single instance of any storage back-end is infinitely
scalable, some of them are really darn fast. That may be enough for
the majority of use cases. It's not outside the realm of possibility
that the inflection point [0] where these design choices result in
performance limitations is at the very high end of scale-out, eg.
public cloud providers who have the resources to invest further in
improving zaqar.

As an example of what I mean, let me refer to the 99th percentile
response time graphs in Kurt's benchmarks [1]... increasing the number
of clients with write-heavy workloads was enough to drive latency from
<10ms to >200 ms with a single service. That latency significantly
improved as storage and application instances were added, which is
good, and what I would expect. These benchmarks do not (and were not
intended to) show the maximal performance of a public-cloud-scale
deployment -- but they do show that performance under different
workloads improves as additional services are started.

While I have no basis for comparing the configuration of the
deployment he used in those tests to what a public cloud operator
might choose to deploy, and presumably such an operator would put
significant work into tuning storage and running more instances of
each service and thus shift that inflection point "to the right", my
point is that, by depending on a single storage instance, Zaqar has
pushed the *ability* to scale out down into the storage
implementation. Given my experience scaling SQL and NoSQL data stores
(in my past life, before working on OpenStack) I have a knee-jerk
reaction to believing that this approach will result in a
public-cloud-scale messaging system.

-Devananda

[0] http://en.wikipedia.org/wiki/Inflection_point -- in this context,
I mean the point on the graph of throughput vs latency where the
derivative goes from near-zero (linear growth) to non-zero
(exponential growth)

[1] https://wiki.openstack.org/wiki/Zaqar/Performance/PubSub/Redis



More information about the OpenStack-dev mailing list