[openstack-dev] [oslo][mistral] Saga of process than ack and where can we go from here...

Ken Giusti kgiusti at gmail.com
Mon Jun 13 17:11:29 UTC 2016


So is this horse dead now, 'cuz I wanna turn hitting it...

First of all, this thread brings up two separate messaging concepts:

1) at-least-once delivery
2) message acknowledgement

For #1 - oslo.messaging cannot guarantee that messages will not be
duplicated.  Specifically in the case of multiple consumers on the
same topic.  In that case, oslo.messaging can only dedup on a
per-consumer basis because a consumer is unaware of what its peers
have received.  Therefore if a re-transmit is sent to a different
consumer than the original transmit (think lost Ack) both consumers
will regard the message as non-duplicate at process it.

For #2, I'll go on the record and say that ack-before-process is
inherently broken.

The acknowledgment is used to inform the messaging subsystem (note I
didn't say 'sender') that the receiver of the message assumed
ownership of the message. It's a transfer of control thing.  The
acknowledgment should only be sent when the consuming application has
completed processing the message.  Can oslo.messaging assume that on
the behalf of the consumer?  I don't think it should.  Acking a
message that hasn't been fully processed will negatively affect the
message window maintained by the message bus, possibly leading to
over-delivery.

Having said that, a proper acking mechanism would allow for
asynchronous acking - sending the ack from a later time or another
thread completely.  As Mehdi pointed out this would require some
significant changes to the oslo.messaging codebase.

my arm is tired - this is one big horse.

Thanks




On Tue, Jun 7, 2016 at 2:48 AM, Renat Akhmerov <renat.akhmerov at gmail.com> wrote:
>
> On 04 Jun 2016, at 04:16, Doug Hellmann <doug at doughellmann.com> wrote:
>
> Excerpts from Joshua Harlow's message of 2016-06-03 09:14:05 -0700:
>
> Deja, Dawid wrote:
>
> On Thu, 2016-05-05 at 11:08 +0700, Renat Akhmerov wrote:
>
>
> On 05 May 2016, at 01:49, Mehdi Abaakouk <sileht at sileht.net
> <mailto:sileht at sileht.net>> wrote:
>
>
> Le 2016-05-04 10:04, Renat Akhmerov a écrit :
>
> No problem. Let’s not call it RPC (btw, I completely agree with that).
> But it’s one of the messaging patterns and hence should be under
> oslo.messaging I guess, no?
>
>
> Yes and no, we currently have two APIs (rpc and notification). And
> personally I regret to have the notification part in oslo.messaging.
>
> RPC and Notification are different beasts, and both are today limited
> in terms of feature because they share the same driver implementation.
>
> Our RPC errors handling is really poor, for example Nova just put
> instance in ERROR when something bad occurs in oslo.messaging layer.
> This enforces deployer/user to fix the issue manually.
>
> Our Notification system doesn't allow fine grain routing of message,
> everything goes into one configured topic/queue.
>
> And now we want to add a new one... I'm not against this idea,
> but I'm not a huge fan.
>
> Thoughts from folks (mistral and oslo)?
>
> Also, I was not at the Summit, should I conclude the Tooz+taskflow
> approach (that ensure the idempotent of the application within the
> library API) have not been accepted by mistral folks ?
>
> Speaking about idempotency, IMO it’s not a central question that we
> should be discussing here. Mistral users should have a choice: if they
> manage to make their actions idempotent it’s excellent, in many cases
> idempotency is certainly possible, btw. If no, then they know about
> potential consequences.
>
>
> You shouldn't mix the idempotency of the user task and the idempotency
> of a Mistral action (that will at the end run the user task).
> You can have your Mistral task runner implementation idempotent and just
> make the workflow to use configurable in case the user task is
> interrupted or badly finished even if the user task is idempotent or not.
> This makes the thing very predictable. You will know for example:
> * if the user task has started or not,
> * if the error is due to a node power cut when the user task runs,
> * if you can safely retry a not idempotent user task on an other node,
> * you will not be impacted by rabbitmq restart or TCP connection issues,
> * ...
>
> With the oslo.messaging approach, everything will just end up in a
> generic MessageTimeout error.
>
> The RPC API already have this kind of issue. Applications have
> unfortunately
> dealt with that (and I think they want something better now).
> I'm just not convinced we should add a new "working queue" API in
> oslo.messaging for tasks scheduling that have the same issue we already
> have with RPC.
>
> Anyway, that's your choice, if you want rely on this poor structure,
> I will
> not be against, I'm not involved in Mistral. I just want everybody is
> aware
> of this.
>
> And even in this case there’s usually a number
> of measures that can be taken to mitigate those consequences (reruning
> workflows from certain points after manually fixing problems, rollback
> scenarios etc.).
>
>
> taskflow allows to describe and automate this kind of workflow really
> easily.
>
> What I’m saying is: let’s not make that crucial decision now about
> what a messaging framework should support or not, let’s make it more
> flexible to account for variety of different usage scenarios.
>
>
> I think the confusion is in the "messaging" keyword, currently
> oslo.messaging
> is a "RPC" framework and a "Notification" framework on top of 'messaging'
> frameworks.
>
> Messaging framework we uses are 'kombu', 'pika', 'zmq' and 'pingus'.
>
> It’s normal for frameworks to give more rather than less.
>
>
> I disagree, here we mix different concepts into one library, all concepts
> have to be implemented by different 'messaging framework',
> So we fortunately give less to make thing just works in the same way
> with all
> drivers for all APIs.
>
> One more thing, at the summit we were discussing the possibility to
> define at-most-once/at-least-once individually for Mistral tasks. This
> is demanded because there cases where we need to do it, advanced users
> may choose one or another depending on a task/action semantics.
> However, it won’t be possible to implement w/o changes in the
> underlying messaging framework.
>
>
> If we goes that way, oslo.messaging users and Mistral users have to
> be aware
> that their job/task/action/whatever will perhaps not be called
> (at-most-once)
> or perhaps called twice (at-least-once).
>
> The oslo.messaging/Mistral API and docs must be clear about this
> behavior to
> not having bugs open against oslo.messaging because script written
> via Mistral
> API is not executed as expected "sometimes".
> "sometimes" == when deployers have trouble with its rabbitmq (or
> whatever)
> broker and even just when a deployer restart a broker node or when a TCP
> issue occurs. At this end the backtrace in theses cases always trows only
> oslo.messaging trace (the well known MessageTimeout...).
>
>
> Also oslo.messaging is already a fragile brick used by everybody that
> a very small subset of people maintain (thanks to them).
>
> I'm afraid that adding such new API will increase the needed
> maintenance for this lib while currently not many people care about
> (the whole lib not the new API).
>
> I also wonder if other project have the same needs (that always help
> to design a new API).
>
>
> Mehdi,
>
> What are you proposing? Can you confirm that we should be just dealing
> with this problem on our own in Mistral? If so, that works well for
> us. Initially we didn’t want to switch to oslo.messaging from direct
> access to RabbitMQ for this and also other reasons. But we got a
> strong feedback from the community that said “you guys need to reuse
> technologies from the community and hence switch to oslo.messaging”.
> So we did, assuming that we would fix all needed issues in
> oslo.messaging relatively soon. Now it’s been ~2 years since then and
> we keep struggling with all that stuff.
>
> When I see these discussions again and again where people try to
> convince that at-least-one delivery is a bad thing I can’t participate
> in them anymore. We spent a lot of time thinking about it and
> experimenting with it and know all pros and cons.
>
> Renat Akhmerov
> @Nokia
>
>
> Maybe this could be resolved in oslo.messaging by following one of
> Python slogans /we are all responsible users here/ [1].
>
> What I'm proposing is to let the consumer of the message decide when to
> send ACK, because it knows best when to do so. I can think of scenarios
> when it is required to send ACK in a middle of message process e.g.
> after receiving message I want to store it in the DB before sending an
> ACK and send it when message is safely stored. Having that we could
> implement whatever delivery model we want in mistral on top of
> oslo.messaging.
>
>
> From my understanding (and some of the oslo.messaging folks can correct
> me if I am wrong); but they (the oslo.messaging maintainers) don't feel
> comfortable allowing such a option to be made possible because of how
> doing such a thing alters the principles of oslo.messaging and increases
> the complexity of the code-base (and subsequent testing, bug reports,
> feature support that come along with enabling such a thing).
>
> Thus why I think the preference was to have this model (which isn't
> really the `rpc` kind of model that oslo.messaging has been targeting at
> that point, but is more like a work-queue) be in another library with a
> clear API that explicitly is targeted at this kind of model. Thus
> instead of having a multi-personality codebase with hidden features like
> this (say in oslo.messaging) instead it gets its own codebase and API
> that is 'just right' (or more close to being 'right') for it's concept
> (vs trying to stuff it into oslo.messaging).
>
>
> What happened to the idea of adding new functions at the level of the
> call & cast functions we have now, that work with at-least-once instead
> of at-most-once semantics? Yes this is a different sort of use case, but
> it's still "messaging".
>
>
> The idea I think is dead. Joshua essentially told the reasons in the
> previous message.
>
> Renat Akhmerov
> @Nokia
>
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>



-- 
Ken Giusti  (kgiusti at gmail.com)



More information about the OpenStack-dev mailing list