[openstack-dev] [Zaqar] Comments on the concerns arose during the TC meeting

Flavio Percoco flavio at redhat.com
Fri Sep 12 08:50:28 UTC 2014


On 09/12/2014 12:14 AM, Zane Bitter wrote:
> On 09/09/14 15:03, Monty Taylor wrote:
>> On 09/04/2014 01:30 AM, Clint Byrum wrote:
>>> Excerpts from Flavio Percoco's message of 2014-09-04 00:08:47 -0700:
>>>> Greetings,
>>>>
>>>> Last Tuesday the TC held the first graduation review for Zaqar. During
>>>> the meeting some concerns arose. I've listed those concerns below with
>>>> some comments hoping that it will help starting a discussion before the
>>>> next meeting. In addition, I've added some comments about the project
>>>> stability at the bottom and an etherpad link pointing to a list of use
>>>> cases for Zaqar.
>>>>
>>>
>>> Hi Flavio. This was an interesting read. As somebody whose attention has
>>> recently been drawn to Zaqar, I am quite interested in seeing it
>>> graduate.
>>>
>>>> # Concerns
>>>>
>>>> - Concern on operational burden of requiring NoSQL deploy expertise to
>>>> the mix of openstack operational skills
>>>>
>>>> For those of you not familiar with Zaqar, it currently supports 2 nosql
>>>> drivers - MongoDB and Redis - and those are the only 2 drivers it
>>>> supports for now. This will require operators willing to use Zaqar to
>>>> maintain a new (?) NoSQL technology in their system. Before expressing
>>>> our thoughts on this matter, let me say that:
>>>>
>>>>      1. By removing the SQLAlchemy driver, we basically removed the
>>>> chance
>>>> for operators to use an already deployed "OpenStack-technology"
>>>>      2. Zaqar won't be backed by any AMQP based messaging technology
>>>> for
>>>> now. Here's[0] a summary of the research the team (mostly done by
>>>> Victoria) did during Juno
>>>>      3. We (OpenStack) used to require Redis for the zmq matchmaker
>>>>      4. We (OpenStack) also use memcached for caching and as the oslo
>>>> caching lib becomes available - or a wrapper on top of dogpile.cache -
>>>> Redis may be used in place of memcached in more and more deployments.
>>>>      5. Ceilometer's recommended storage driver is still MongoDB,
>>>> although
>>>> Ceilometer has now support for sqlalchemy. (Please correct me if I'm
>>>> wrong).
>>>>
>>>> That being said, it's obvious we already, to some extent, promote some
>>>> NoSQL technologies. However, for the sake of the discussion, lets
>>>> assume
>>>> we don't.
>>>>
>>>> I truly believe, with my OpenStack (not Zaqar's) hat on, that we can't
>>>> keep avoiding these technologies. NoSQL technologies have been around
>>>> for years and we should be prepared - including OpenStack operators
>>>> - to
>>>> support these technologies. Not every tool is good for all tasks - one
>>>> of the reasons we removed the sqlalchemy driver in the first place -
>>>> therefore it's impossible to keep an homogeneous environment for all
>>>> services.
>>>>
>>>
>>> I whole heartedly agree that non traditional storage technologies that
>>> are becoming mainstream are good candidates for use cases where SQL
>>> based storage gets in the way. I wish there wasn't so much FUD
>>> (warranted or not) about MongoDB, but that is the reality we live in.
>>>
>>>> With this, I'm not suggesting to ignore the risks and the extra burden
>>>> this adds but, instead of attempting to avoid it completely by not
>>>> evolving the stack of services we provide, we should probably work on
>>>> defining a reasonable subset of NoSQL services we are OK with
>>>> supporting. This will help making the burden smaller and it'll give
>>>> operators the option to choose.
>>>>
>>>> [0] http://blog.flaper87.com/post/marconi-amqp-see-you-later/
>>>>
>>>>
>>>> - Concern on should we really reinvent a queue system rather than
>>>> piggyback on one
>>>>
>>>> As mentioned in the meeting on Tuesday, Zaqar is not reinventing
>>>> message
>>>> brokers. Zaqar provides a service akin to SQS from AWS with an
>>>> OpenStack
>>>> flavor on top. [0]
>>>>
>>>
>>> I think Zaqar is more like SMTP and IMAP than AMQP. You're not really
>>> trying to connect two processes in real time. You're trying to do fully
>>> asynchronous messaging with fully randomized access to any message.
>>>
>>> Perhaps somebody should explore whether the approaches taken by large
>>> scale IMAP providers could be applied to Zaqar.
>>>
>>> Anyway, I can't imagine writing a system to intentionally use the
>>> semantics of IMAP and SMTP. I'd be very interested in seeing actual use
>>> cases for it, apologies if those have been posted before.
>>
>> It seems like you're EITHER describing something called XMPP that has at
>> least one open source scalable backend called ejabberd. OR, you've
>> actually hit the nail on the head with bringing up SMTP and IMAP but for
>> some reason that feels strange.
>>
>> SMTP and IMAP already implement every feature you've described, as well
>> as retries/failover/HA and a fully end to end secure transport (if
>> installed properly) If you don't actually set them up to run as a public
>> messaging interface but just as a cloud-local exchange, then you could
>> get by with very low overhead for a massive throughput - it can very
>> easily be run on a single machine for Sean's simplicity, and could just
>> as easily be scaled out using well known techniques for public cloud
>> sized deployments?
>>
>> So why not use existing daemons that do this? You could still use the
>> REST API you've got, but instead of writing it to a mongo backend and
>> trying to implement all of the things that already exist in SMTP/IMAP -
>> you could just have them front to it. You could even bypass normal
>> delivery mechanisms and do neat things with local injection.
>>
>> I don't care about the NoSQL question on its own. Mongo is fine. Redis
>> is fine. I don't think either has any features for this use case that
>> make a licks worth of difference compared to MySQL or Postgres, but I
>> also don't think they are a PROBLEM in an of themselves.
>>
>> The main thing I care about here is every description I've heard of what
>> zaqar wants to do (which does seem to be getting clearer through this
>> thread) is still well implemented somewhere as an existing scalable
>> service. Is zaqar actually Rabbit with a REST interface? Is it ejabberd
>> with a rest interface? Or is it IMAP/SMTP with a REST interface. You'll
>> note that probably nobody would think a single server that wanted to be
>> both Rabbit AND IMAP/SMTP is a good idea ... at least this is one of the
>> reasons why we all think Microsoft Exchange is a pile of garbage, no?
> 
> I was intrigued by the idea of an ejabberd backend to Zaqar, so I spent
> half a morning yesterday investigating it. (tl;dr - it won't work.)
> 
> XMPP does have a sort-of standard for queueing messages when a client is
> offline[1], and ejabberd does support it[2]. Amusingly, it does so by
> storing the queue in a RDBMS (the very thing that the TC has repeatedly
> called an 'anti-pattern'). Unfortunately, ejabberd does _not_ support[2]
> the extension that would allow the Zaqar API to request messages one at
> a time (in arbitrary order, though that's not important here) out of the
> queue[3], so if I understand your proposal correctly every time the API
> polled ejabberd it could potentially receive a flood of messages that it
> would then have to reliably buffer itself (i.e. duplicating all the work
> that ejabberd was supposed to eliminate). In fact, XMPP is not designed
> to be reliable at all. There is an XMPP extension that could potentially
> offer reliable delivery via acks[4], although it's not entirely clear to
> me if that requires the participation of the client (i.e. effectively
> becomes synchronous messaging).
> 
> So, in summary, not a good fit because it doesn't match the #1
> requirement, which is to never lose messages while remaining asynchronous.
> 
> I can't figure out if the suggestion to use dovecot was actually serious.
> 
> [1] http://www.xmpp.org/extensions/xep-0160.html
> [2] http://www.ejabberd.im/protocols
> [3] http://xmpp.org/extensions/xep-0013.html
> [4] http://xmpp.org/extensions/xep-0079.html
> 
> 
> In any event, I think it's probably unhelpful to come at this from the
> angle of "which orange is the best one to compare this to, and please
> don't even talk to me about other apples". The thing Zaqar is most
> directly comparable to is not email or XMPP, it's SQS.
> 
> SQS offers a guarantee of delivering each message *at least* once.[5] It
> is optimised for durability rather than latency. It also tries to
> minimise multiple deliveries in the case where multiple clients are
> polling the same queue (e.g. a work queue).
> 
> Zaqar offers somewhat more complicated semantics[6]. I think we should
> discuss those semantics and agree on which are essential and which
> dispensable, rather than trying to compare it to things like IMAP. Once
> we have agreement on what the semantics should be, then we can sensibly
> discuss which back ends are capable of satisfying them.
> 
> [5]
> https://en.wikipedia.org/wiki/Amazon_Simple_Queue_Service#Message_delivery
> [6]
> https://wiki.openstack.org/wiki/Zaqar/Frequently_asked_questions#What_messaging_patterns_does_Zaqar_support.3F

Zane, thanks a lot for taking the time to dig into this. It was an
interesting read.

> Zaqar obviously supports point-to-point queues, with one producer and
> one consumer. I assume it also supports many:1 and anycast 1:many &
> many:many queues - it could hardly fail to do so, since it doesn't
> actually know who the producers and consumers are. Hopefully it takes
> steps to ensure that multiple workers rarely receive the same message
> before it is acknowledged.

Zaqar supports once and only once delivery. It also, by default, does
not echo messages back to the same client that produced them, it is
possible to tell Zaqar to return messages back to the same client. It's
up to the user.

> 
> However, Zaqar also supports the Pub-Sub model of messaging. I believe,
> but would like Flavio to confirm, that this is what is meant when the
> Zaqar team say that Zaqar is about messaging in general and not just
> queuing. That is to say, it is possible for multiple consumers to
> intentionally consume the same message, with each maintaining its own
> pointer in the queue. (Another way to think of this is that messages can
> be multicast to multiple virtual queues, with data de-duplication
> between them.) To a relative novice in the field like me, the difference
> between this and queuing sounds pretty academic :P. Call it what you
> will, it seems like a reasonable thing to implement to me.

Correct, this and other messaging patterns supported by Zaqar make it a
messaging service, which as Gordon mentioned in another email is just a
more generic term. Messages are the most important resource in Zaqar and
providing good, common and scalable patterns to access those messages is
what we strive for in Zaqar's API.

> What's not clear to me is whether Zaqar supports a model where multiple
> different publish queues are somehow multiplexed together into each
> subscription queue with the subscribers able to look and determine which
> messages to receive and which not to. I do *not* think Zaqar supports
> that (but again, would like Flavio to confirm). I definitely think it
> would be a mistake if it did. And I think that this is the kind of thing
> that Clint is referring to with the IMAP analogy.

Correct, it does not support this.

> The final question is the one of arbitrary access to messages in the
> queue (or "queue" if you prefer). Flavio indicated that this effectively
> came for free with their implementation of Pub-Sub. IMHO it is
> unnecessary and limits the choice of potential back ends in the future.
> I would personally be +1 on removing it from the v2 API, and also +1 on
> the v2 API shipping in Kilo so that as few new adopters as possible get
> stuck with the limited choices of back-end. I hope that would resolve
> Clint's concerns that we need a separate, light-weight queue system; I
> personally don't believe we need two projects, even though I agree that
> all of the use cases I personally care about could probably be satisfied
> without Pub-Sub.

Right, being able to support other backends is one of the reasons we're
looking forward to remove the support for arbitrary access to messages.
As of now, the plan is to remove that endpoint unless a very good use
case comes up that makes supporting other backends not worth it, which I
doubt. The feedback from Zaqar's early adopters is that the endpoint is
indeed not useful.


> As Rob pointed out, one of the more obvious choices of back end for an
> API like the one I just described would be Apache Kafka. Unfortunately
> it is a massive Java application with Zookeeper dependencies, and we all
> know how Monty feels about those ;) (FWIW, I agree with him.) Given
> that's a non-starter as the _default_ back-end, the current design of
> allowing multiple pluggable storage back ends, starting with MongoDB and
> Redis, seems like not a bad one to me.
> 
>> I also worry about the fact that one description of zaqar was used to
>> communicate a need for divergent requirements (it needs to be a
>> high-volume fast message broker/queue - which, btw, sounds more like
>> Rabbit/oslo.messaging and less like what Clint describes above) ... and
>> that's why it wants to use falcon and not pecan and why it wants to use
>> mongo and not SQL. And then what we're doing it reimplementing something
>> like rabbit except in python (again, given as the justification for
>> deviating from how other bits of OpenStack work)
> 
> The idea of Zaqar is that it'll be the central place for polling stuff
> in OpenStack. So it's going to get hit a lot, and it makes sense to do
> as little work on each request as possible because work is expensive and
> there will be a lot of requests. It doesn't follow that the main aim is
> to optimise for latency and throughput (as it is with AMQP).
> 
> Last I checked, pretty much every OpenStack API was using a different
> web framework already and it hasn't been much more than a minor
> annoyance as far as I know.

In addition to the above, it'd be a mistake to compare Zaqar's
performance with other messaging technologies like rabbitmq, qpid or
even kafka. Though, it's fine to do so if we *just* want to know "how
far" we are from those services in terms of performance.

Zaqar's being optimized for the use cases it aims to cover. The team has
put enough effort and reviews on these optimizations, which are
different from the ones you'd optimize rabbitmq for.

Thanks a lot Zane for taking the time to dig into this topic,
Flavio

-- 
@flaper87
Flavio Percoco



More information about the OpenStack-dev mailing list