<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Sep 22, 2014 at 9:58 AM, Zane Bitter <span dir="ltr"><<a href="mailto:zbitter@redhat.com" target="_blank">zbitter@redhat.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><span class="">On 22/09/14 10:11, Gordon Sim wrote:<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
On 09/19/2014 09:13 PM, Zane Bitter wrote:<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
SQS offers very, very limited guarantees, and it's clear that the reason<br>
for that is to make it massively, massively scalable in the way that<br>
e.g. S3 is scalable while also remaining comparably durable (S3 is<br>
supposedly designed for 11 nines, BTW).<br>
<br>
Zaqar, meanwhile, seems to be promising the world in terms of<br>
guarantees. (And then taking it away in the fine print, where it says<br>
that the operator can disregard many of them, potentially without the<br>
user's knowledge.)<br>
<br>
On the other hand, IIUC Zaqar does in fact have a sharding feature<br>
("Pools") which is its answer to the massive scaling question.<br>
</blockquote>
<br>
There are different dimensions to the scaling problem.<br>
</blockquote>
<br></span>
Many thanks for this analysis, Gordon. This is really helpful stuff.<span class=""><br>
<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
As I understand it, pools don't help scaling a given queue since all the<br>
messages for that queue must be in the same pool. At present traffic<br>
through different Zaqar queues are essentially entirely orthogonal<br>
streams. Pooling can help scale the number of such orthogonal streams,<br>
but to be honest, that's the easier part of the problem.<br>
</blockquote>
<br></span>
But I think it's also the important part of the problem. When I talk about scaling, I mean 1 million clients sending 10 messages per second each, not 10 clients sending 1 million messages per second each.<br>
<br>
When a user gets to the point that individual queues have massive throughput, it's unlikely that a one-size-fits-all cloud offering like Zaqar or SQS is _ever_ going to meet their needs. Those users will want to spin up and configure their own messaging systems on Nova servers, and at that kind of size they'll be able to afford to. (In fact, they may not be able to afford _not_ to, assuming per-message-based pricing.)<span class=""><br></span></blockquote><div><br></div><div>Running a message queue that has a high guarantee of not loosing a message is hard and SQS promises exactly that, it *will* deliver your message. If a use case can handle occasionally dropping messages then running your own MQ makes more sense.</div><div><br></div><div>SQS is designed to handle massive queues as well, while I haven't found any examples of queues that have 1 million messages/second being sent or received 30k to 100k messages/second is not unheard of [0][1][2].</div><div><br></div><div>[0] <a href="https://www.youtube.com/watch?v=zwLC5xmCZUs#t=22m53s">https://www.youtube.com/watch?v=zwLC5xmCZUs#t=22m53s</a></div><div>[1] <a href="http://java.dzone.com/articles/benchmarking-sqs">http://java.dzone.com/articles/benchmarking-sqs</a></div><div>[2] <a href="http://www.slideshare.net/AmazonWebServices/massive-message-processing-with-amazon-sqs-and-amazon-dynamodb-arc301-aws-reinvent-2013-28431182">http://www.slideshare.net/AmazonWebServices/massive-message-processing-with-amazon-sqs-and-amazon-dynamodb-arc301-aws-reinvent-2013-28431182</a></div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><span class="">
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
There is also the possibility of using the sharding capabilities of the<br>
underlying storage. But the pattern of use will determine how effective<br>
that can be.<br>
<br>
So for example, on the ordering question, if order is defined by a<br>
single sequence number held in the database and atomically incremented<br>
for every message published, that is not likely to be something where<br>
the databases sharding is going to help in scaling the number of<br>
concurrent publications.<br>
<br>
Though sharding would allow scaling the total number messages on the<br>
queue (by distributing them over multiple shards), the total ordering of<br>
those messages reduces it's effectiveness in scaling the number of<br>
concurrent getters (e.g. the concurrent subscribers in pub-sub) since<br>
they will all be getting the messages in exactly the same order.<br>
<br>
Strict ordering impacts the competing consumers case also (and is in my<br>
opinion of limited value as a guarantee anyway). At any given time, the<br>
head of the queue is in one shard, and all concurrent claim requests<br>
will contend for messages in that same shard. Though the unsuccessful<br>
claimants may then move to another shard as the head moves, they will<br>
all again try to access the messages in the same order.<br>
<br>
So if Zaqar's goal is to scale the number of orthogonal queues, and the<br>
number of messages held at any time within these, the pooling facility<br>
and any sharding capability in the underlying store for a pool would<br>
likely be effective even with the strict ordering guarantee.<br>
</blockquote>
<br></span>
IMHO this is (or should be) the goal - support enormous numbers of small-to-moderate sized queues.</blockquote><div><br></div><div>If 50,000 messages per second doesn't count as small-to-moderate then Zaqar does not fulfill a major SQS use case.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><span class=""><br>
<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
If scaling the number of communicants on a given communication channel<br>
is a goal however, then strict ordering may hamper that. If it does, it<br>
seems to me that this is not just a policy tweak on the underlying<br>
datastore to choose the desired balance between ordering and scale, but<br>
a more fundamental question on the internal structure of the queue<br>
implementation built on top of the datastore.<br>
</blockquote>
<br></span>
I agree with your analysis, but I don't think this should be a goal.<br>
<br>
Note that the user can still implement this themselves using application-level sharding - if you know that in-order delivery is not important to you, then randomly assign clients to a queue and then poll all of the queues in the round-robin. This yields _exactly_ the same semantics as SQS. </blockquote><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<br>
The reverse is true of SQS - if you want FIFO then you have to implement re-ordering by sequence number in your application. (I'm not certain, but it also sounds very much like this situation is ripe for losing messages when your client dies.)<br>
<br>
So the question is: in which use case do we want to push additional complexity into the application? The case where there are truly massive volumes of messages flowing to a single point? Or the case where the application wants the messages in order?<br>
<br>
I'd suggest both that the former applications are better able to handle that extra complexity and that the latter applications are probably more common. So it seems that the Zaqar team made a good decision.<br></blockquote><div><br></div><div>If Zaqar is supposed to be comparable to amazon SQS, then it has picked the wrong choice.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<br>
(Aside: it follows that Zaqar probably should have a maximum throughput quota for each queue; or that it should report usage information in such a way that the operator could sometimes bill more for a single queue than they would for the same amount of usage spread across multiple queues; or both.)<span class=""><br>
<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
I also get the impression, perhaps wrongly, that providing the strict<br>
ordering guarantee wasn't necessarily an explicit requirement, but was<br>
simply a property of the underlying implementation(?).<br>
</blockquote>
<br></span>
I wasn't involved, but I expect it was a bit of both (i.e. it is a chicken/egg question).<br>
<br>
cheers,<br>
Zane.<div class=""><div class="h5"><br>
<br>
______________________________<u></u>_________________<br>
OpenStack-dev mailing list<br>
<a href="mailto:OpenStack-dev@lists.openstack.org" target="_blank">OpenStack-dev@lists.openstack.<u></u>org</a><br>
<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" target="_blank">http://lists.openstack.org/<u></u>cgi-bin/mailman/listinfo/<u></u>openstack-dev</a><br>
</div></div></blockquote></div><br></div></div>