[openstack-dev] [Zaqar] Zaqar and SQS Properties of Distributed Queues

Flavio Percoco flavio at redhat.com
Fri Sep 19 08:41:33 UTC 2014


On 09/18/2014 07:16 PM, Devananda van der Veen wrote:
> On Thu, Sep 18, 2014 at 8:54 AM, Devananda van der Veen
> <devananda.vdv at gmail.com> wrote:
>> On Thu, Sep 18, 2014 at 7:45 AM, Flavio Percoco <flavio at redhat.com> wrote:
>>> On 09/18/2014 04:09 PM, Gordon Sim wrote:
>>>> On 09/18/2014 12:31 PM, Flavio Percoco wrote:
>>>>> Zaqar guarantees FIFO. To be more precise, it does that relying on the
>>>>> storage backend ability to do so as well. Depending on the storage used,
>>>>> guaranteeing FIFO may have some performance penalties.
>>>>
>>>> Would it be accurate to say that at present Zaqar does not use
>>>> distributed queues, but holds all queue data in a storage mechanism of
>>>> some form which may internally distribute that data among servers but
>>>> provides Zaqar with a consistent data model of some form?
>>>
>>> I think this is accurate. The queue's distribution depends on the
>>> storage ability to do so and deployers will be able to choose what
>>> storage works best for them based on this as well. I'm not sure how
>>> useful this separation is from a user perspective but I do see the
>>> relevance when it comes to implementation details and deployments.
>>
>> Guaranteeing FIFO and not using a distributed queue architecture
>> *above* the storage backend are both scale-limiting design choices.
>> That Zaqar's scalability depends on the storage back end is not a
>> desirable thing in a cloud-scale messaging system in my opinion,
>> because this will prevent use at scales which can not be accommodated
>> by a single storage back end.
>>
> 
> It may be worth qualifying this a bit more.
> 
> While no single instance of any storage back-end is infinitely
> scalable, some of them are really darn fast. That may be enough for
> the majority of use cases. It's not outside the realm of possibility
> that the inflection point [0] where these design choices result in
> performance limitations is at the very high end of scale-out, eg.
> public cloud providers who have the resources to invest further in
> improving zaqar.
> 
> As an example of what I mean, let me refer to the 99th percentile
> response time graphs in Kurt's benchmarks [1]... increasing the number
> of clients with write-heavy workloads was enough to drive latency from
> <10ms to >200 ms with a single service. That latency significantly
> improved as storage and application instances were added, which is
> good, and what I would expect. These benchmarks do not (and were not
> intended to) show the maximal performance of a public-cloud-scale
> deployment -- but they do show that performance under different
> workloads improves as additional services are started.
> 
> While I have no basis for comparing the configuration of the
> deployment he used in those tests to what a public cloud operator
> might choose to deploy, and presumably such an operator would put
> significant work into tuning storage and running more instances of
> each service and thus shift that inflection point "to the right", my
> point is that, by depending on a single storage instance, Zaqar has
> pushed the *ability* to scale out down into the storage
> implementation. Given my experience scaling SQL and NoSQL data stores
> (in my past life, before working on OpenStack) I have a knee-jerk
> reaction to believing that this approach will result in a
> public-cloud-scale messaging system.

Thanks for the more detailed explanation of your concern, I appreciate it.

Let me start by saying that I agree with the fact that pushing messages
distribution down to the storage may end up in some scaling limitations
for some scenarios.

That said, Zaqar already has the knowledge of pools. Pools allow
operators to add more storage clusters to Zaqar and with that balance
the load between them. It is possible to distribute the data across
these pools in a per-queue basis. While the messages of a queue are not
distributed across multiple *pools* - all the messages for queue X will
live in a single pool - I do believe this per-queue distribution helps
to address the above concern and pushes that limitation farther away.

Let me explain how pools currently work a bit better. As of now, each
pool has a URI pointing to a storage cluster and a weight. This weight
is used to balance load between pools every time a queue is created.
Once it's created, Zaqar keeps the information of the queue<->pool
association in a catalogue that is used to know where the queue lives.
We'll likely add new algorithms to have a better and more even
distribution of queues across the registered pools.

I'm sure message distribution could be implemented in Zaqar but I'm not
convinced we should do so right now. The reason being it would bring in
a whole lot of new issues to the project that I think we can and should
avoid for now.

Thanks for the feedback, Devananda.
Flavio


> 
> -Devananda
> 
> [0] http://en.wikipedia.org/wiki/Inflection_point -- in this context,
> I mean the point on the graph of throughput vs latency where the
> derivative goes from near-zero (linear growth) to non-zero
> (exponential growth)
> 
> [1] https://wiki.openstack.org/wiki/Zaqar/Performance/PubSub/Redis
> 


-- 
@flaper87
Flavio Percoco



More information about the OpenStack-dev mailing list