[openstack-dev] [Zaqar] Zaqar and SQS Properties of Distributed Queues

Joe Gordon joe.gordon0 at gmail.com
Mon Sep 22 21:06:02 UTC 2014


On Mon, Sep 22, 2014 at 9:58 AM, Zane Bitter <zbitter at redhat.com> wrote:

> On 22/09/14 10:11, Gordon Sim wrote:
>
>> On 09/19/2014 09:13 PM, Zane Bitter wrote:
>>
>>> SQS offers very, very limited guarantees, and it's clear that the reason
>>> for that is to make it massively, massively scalable in the way that
>>> e.g. S3 is scalable while also remaining comparably durable (S3 is
>>> supposedly designed for 11 nines, BTW).
>>>
>>> Zaqar, meanwhile, seems to be promising the world in terms of
>>> guarantees. (And then taking it away in the fine print, where it says
>>> that the operator can disregard many of them, potentially without the
>>> user's knowledge.)
>>>
>>> On the other hand, IIUC Zaqar does in fact have a sharding feature
>>> ("Pools") which is its answer to the massive scaling question.
>>>
>>
>> There are different dimensions to the scaling problem.
>>
>
> Many thanks for this analysis, Gordon. This is really helpful stuff.
>
>  As I understand it, pools don't help scaling a given queue since all the
>> messages for that queue must be in the same pool. At present traffic
>> through different Zaqar queues are essentially entirely orthogonal
>> streams. Pooling can help scale the number of such orthogonal streams,
>> but to be honest, that's the easier part of the problem.
>>
>
> But I think it's also the important part of the problem. When I talk about
> scaling, I mean 1 million clients sending 10 messages per second each, not
> 10 clients sending 1 million messages per second each.
>
> When a user gets to the point that individual queues have massive
> throughput, it's unlikely that a one-size-fits-all cloud offering like
> Zaqar or SQS is _ever_ going to meet their needs. Those users will want to
> spin up and configure their own messaging systems on Nova servers, and at
> that kind of size they'll be able to afford to. (In fact, they may not be
> able to afford _not_ to, assuming per-message-based pricing.)
>

Running a message queue that has a high guarantee of not loosing a message
is hard and SQS promises exactly that, it *will* deliver your message. If a
use case can handle occasionally dropping messages then running your own MQ
makes more sense.

SQS is designed to handle massive queues as well, while I haven't found any
examples of queues that have 1 million messages/second being sent or
received  30k to 100k messages/second is not unheard of [0][1][2].

[0] https://www.youtube.com/watch?v=zwLC5xmCZUs#t=22m53s
[1] http://java.dzone.com/articles/benchmarking-sqs
[2]
http://www.slideshare.net/AmazonWebServices/massive-message-processing-with-amazon-sqs-and-amazon-dynamodb-arc301-aws-reinvent-2013-28431182


>  There is also the possibility of using the sharding capabilities of the
>> underlying storage. But the pattern of use will determine how effective
>> that can be.
>>
>> So for example, on the ordering question, if order is defined by a
>> single sequence number held in the database and atomically incremented
>> for every message published, that is not likely to be something where
>> the databases sharding is going to help in scaling the number of
>> concurrent publications.
>>
>> Though sharding would allow scaling the total number messages on the
>> queue (by distributing them over multiple shards), the total ordering of
>> those messages reduces it's effectiveness in scaling the number of
>> concurrent getters (e.g. the concurrent subscribers in pub-sub) since
>> they will all be getting the messages in exactly the same order.
>>
>> Strict ordering impacts the competing consumers case also (and is in my
>> opinion of limited value as a guarantee anyway). At any given time, the
>> head of the queue is in one shard, and all concurrent claim requests
>> will contend for messages in that same shard. Though the unsuccessful
>> claimants may then move to another shard as the head moves, they will
>> all again try to access the messages in the same order.
>>
>> So if Zaqar's goal is to scale the number of orthogonal queues, and the
>> number of messages held at any time within these, the pooling facility
>> and any sharding capability in the underlying store for a pool would
>> likely be effective even with the strict ordering guarantee.
>>
>
> IMHO this is (or should be) the goal - support enormous numbers of
> small-to-moderate sized queues.


If 50,000 messages per second doesn't count as small-to-moderate then Zaqar
does not fulfill a major SQS use case.


>
>
>  If scaling the number of communicants on a given communication channel
>> is a goal however, then strict ordering may hamper that. If it does, it
>> seems to me that this is not just a policy tweak on the underlying
>> datastore to choose the desired balance between ordering and scale, but
>> a more fundamental question on the internal structure of the queue
>> implementation built on top of the datastore.
>>
>
> I agree with your analysis, but I don't think this should be a goal.
>
> Note that the user can still implement this themselves using
> application-level sharding - if you know that in-order delivery is not
> important to you, then randomly assign clients to a queue and then poll all
> of the queues in the round-robin. This yields _exactly_ the same semantics
> as SQS.


> The reverse is true of SQS - if you want FIFO then you have to implement
> re-ordering by sequence number in your application. (I'm not certain, but
> it also sounds very much like this situation is ripe for losing messages
> when your client dies.)
>
> So the question is: in which use case do we want to push additional
> complexity into the application? The case where there are truly massive
> volumes of messages flowing to a single point? Or the case where the
> application wants the messages in order?
>
> I'd suggest both that the former applications are better able to handle
> that extra complexity and that the latter applications are probably more
> common. So it seems that the Zaqar team made a good decision.
>

If Zaqar is supposed to be comparable to amazon SQS, then it has picked the
wrong choice.


>
> (Aside: it follows that Zaqar probably should have a maximum throughput
> quota for each queue; or that it should report usage information in such a
> way that the operator could sometimes bill more for a single queue than
> they would for the same amount of usage spread across multiple queues; or
> both.)
>
>  I also get the impression, perhaps wrongly, that providing the strict
>> ordering guarantee wasn't necessarily an explicit requirement, but was
>> simply a property of the underlying implementation(?).
>>
>
> I wasn't involved, but I expect it was a bit of both (i.e. it is a
> chicken/egg question).
>
> cheers,
> Zane.
>
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20140922/b24ac457/attachment.html>


More information about the OpenStack-dev mailing list