[openstack-dev] [oslo][mistral] Saga of process than ack and where can we go from here...

Renat Akhmerov renat.akhmerov at gmail.com
Tue Jun 7 06:48:55 UTC 2016


> On 04 Jun 2016, at 04:16, Doug Hellmann <doug at doughellmann.com> wrote:
> 
> Excerpts from Joshua Harlow's message of 2016-06-03 09:14:05 -0700:
>> Deja, Dawid wrote:
>>> On Thu, 2016-05-05 at 11:08 +0700, Renat Akhmerov wrote:
>>>> 
>>>>> On 05 May 2016, at 01:49, Mehdi Abaakouk <sileht at sileht.net
>>>>> <mailto:sileht at sileht.net>> wrote:
>>>>> 
>>>>> 
>>>>> Le 2016-05-04 10:04, Renat Akhmerov a écrit :
>>>>>> No problem. Let’s not call it RPC (btw, I completely agree with that).
>>>>>> But it’s one of the messaging patterns and hence should be under
>>>>>> oslo.messaging I guess, no?
>>>>> 
>>>>> Yes and no, we currently have two APIs (rpc and notification). And
>>>>> personally I regret to have the notification part in oslo.messaging.
>>>>> 
>>>>> RPC and Notification are different beasts, and both are today limited
>>>>> in terms of feature because they share the same driver implementation.
>>>>> 
>>>>> Our RPC errors handling is really poor, for example Nova just put
>>>>> instance in ERROR when something bad occurs in oslo.messaging layer.
>>>>> This enforces deployer/user to fix the issue manually.
>>>>> 
>>>>> Our Notification system doesn't allow fine grain routing of message,
>>>>> everything goes into one configured topic/queue.
>>>>> 
>>>>> And now we want to add a new one... I'm not against this idea,
>>>>> but I'm not a huge fan.
>>>>> 
>>>>>>>>> Thoughts from folks (mistral and oslo)?
>>>>>>> Also, I was not at the Summit, should I conclude the Tooz+taskflow
>>>>>>> approach (that ensure the idempotent of the application within the
>>>>>>> library API) have not been accepted by mistral folks ?
>>>>>> Speaking about idempotency, IMO it’s not a central question that we
>>>>>> should be discussing here. Mistral users should have a choice: if they
>>>>>> manage to make their actions idempotent it’s excellent, in many cases
>>>>>> idempotency is certainly possible, btw. If no, then they know about
>>>>>> potential consequences.
>>>>> 
>>>>> You shouldn't mix the idempotency of the user task and the idempotency
>>>>> of a Mistral action (that will at the end run the user task).
>>>>> You can have your Mistral task runner implementation idempotent and just
>>>>> make the workflow to use configurable in case the user task is
>>>>> interrupted or badly finished even if the user task is idempotent or not.
>>>>> This makes the thing very predictable. You will know for example:
>>>>> * if the user task has started or not,
>>>>> * if the error is due to a node power cut when the user task runs,
>>>>> * if you can safely retry a not idempotent user task on an other node,
>>>>> * you will not be impacted by rabbitmq restart or TCP connection issues,
>>>>> * ...
>>>>> 
>>>>> With the oslo.messaging approach, everything will just end up in a
>>>>> generic MessageTimeout error.
>>>>> 
>>>>> The RPC API already have this kind of issue. Applications have
>>>>> unfortunately
>>>>> dealt with that (and I think they want something better now).
>>>>> I'm just not convinced we should add a new "working queue" API in
>>>>> oslo.messaging for tasks scheduling that have the same issue we already
>>>>> have with RPC.
>>>>> 
>>>>> Anyway, that's your choice, if you want rely on this poor structure,
>>>>> I will
>>>>> not be against, I'm not involved in Mistral. I just want everybody is
>>>>> aware
>>>>> of this.
>>>>> 
>>>>>> And even in this case there’s usually a number
>>>>>> of measures that can be taken to mitigate those consequences (reruning
>>>>>> workflows from certain points after manually fixing problems, rollback
>>>>>> scenarios etc.).
>>>>> 
>>>>> taskflow allows to describe and automate this kind of workflow really
>>>>> easily.
>>>>> 
>>>>>> What I’m saying is: let’s not make that crucial decision now about
>>>>>> what a messaging framework should support or not, let’s make it more
>>>>>> flexible to account for variety of different usage scenarios.
>>>>> 
>>>>> I think the confusion is in the "messaging" keyword, currently
>>>>> oslo.messaging
>>>>> is a "RPC" framework and a "Notification" framework on top of 'messaging'
>>>>> frameworks.
>>>>> 
>>>>> Messaging framework we uses are 'kombu', 'pika', 'zmq' and 'pingus'.
>>>>> 
>>>>>> It’s normal for frameworks to give more rather than less.
>>>>> 
>>>>> I disagree, here we mix different concepts into one library, all concepts
>>>>> have to be implemented by different 'messaging framework',
>>>>> So we fortunately give less to make thing just works in the same way
>>>>> with all
>>>>> drivers for all APIs.
>>>>> 
>>>>>> One more thing, at the summit we were discussing the possibility to
>>>>>> define at-most-once/at-least-once individually for Mistral tasks. This
>>>>>> is demanded because there cases where we need to do it, advanced users
>>>>>> may choose one or another depending on a task/action semantics.
>>>>>> However, it won’t be possible to implement w/o changes in the
>>>>>> underlying messaging framework.
>>>>> 
>>>>> If we goes that way, oslo.messaging users and Mistral users have to
>>>>> be aware
>>>>> that their job/task/action/whatever will perhaps not be called
>>>>> (at-most-once)
>>>>> or perhaps called twice (at-least-once).
>>>>> 
>>>>> The oslo.messaging/Mistral API and docs must be clear about this
>>>>> behavior to
>>>>> not having bugs open against oslo.messaging because script written
>>>>> via Mistral
>>>>> API is not executed as expected "sometimes".
>>>>> "sometimes" == when deployers have trouble with its rabbitmq (or
>>>>> whatever)
>>>>> broker and even just when a deployer restart a broker node or when a TCP
>>>>> issue occurs. At this end the backtrace in theses cases always trows only
>>>>> oslo.messaging trace (the well known MessageTimeout...).
>>>>> 
>>>>> 
>>>>> Also oslo.messaging is already a fragile brick used by everybody that
>>>>> a very small subset of people maintain (thanks to them).
>>>>> 
>>>>> I'm afraid that adding such new API will increase the needed
>>>>> maintenance for this lib while currently not many people care about
>>>>> (the whole lib not the new API).
>>>>> 
>>>>> I also wonder if other project have the same needs (that always help
>>>>> to design a new API).
>>>> 
>>>> Mehdi,
>>>> 
>>>> What are you proposing? Can you confirm that we should be just dealing
>>>> with this problem on our own in Mistral? If so, that works well for
>>>> us. Initially we didn’t want to switch to oslo.messaging from direct
>>>> access to RabbitMQ for this and also other reasons. But we got a
>>>> strong feedback from the community that said “you guys need to reuse
>>>> technologies from the community and hence switch to oslo.messaging”.
>>>> So we did, assuming that we would fix all needed issues in
>>>> oslo.messaging relatively soon. Now it’s been ~2 years since then and
>>>> we keep struggling with all that stuff.
>>>> 
>>>> When I see these discussions again and again where people try to
>>>> convince that at-least-one delivery is a bad thing I can’t participate
>>>> in them anymore. We spent a lot of time thinking about it and
>>>> experimenting with it and know all pros and cons.
>>>> 
>>>> Renat Akhmerov
>>>> @Nokia
>>> 
>>> Maybe this could be resolved in oslo.messaging by following one of
>>> Python slogans /we are all responsible users here/ [1].
>>> 
>>> What I'm proposing is to let the consumer of the message decide when to
>>> send ACK, because it knows best when to do so. I can think of scenarios
>>> when it is required to send ACK in a middle of message process e.g.
>>> after receiving message I want to store it in the DB before sending an
>>> ACK and send it when message is safely stored. Having that we could
>>> implement whatever delivery model we want in mistral on top of
>>> oslo.messaging.
>> 
>> From my understanding (and some of the oslo.messaging folks can correct 
>> me if I am wrong); but they (the oslo.messaging maintainers) don't feel 
>> comfortable allowing such a option to be made possible because of how 
>> doing such a thing alters the principles of oslo.messaging and increases 
>> the complexity of the code-base (and subsequent testing, bug reports, 
>> feature support that come along with enabling such a thing).
>> 
>> Thus why I think the preference was to have this model (which isn't 
>> really the `rpc` kind of model that oslo.messaging has been targeting at 
>> that point, but is more like a work-queue) be in another library with a 
>> clear API that explicitly is targeted at this kind of model. Thus 
>> instead of having a multi-personality codebase with hidden features like 
>> this (say in oslo.messaging) instead it gets its own codebase and API 
>> that is 'just right' (or more close to being 'right') for it's concept 
>> (vs trying to stuff it into oslo.messaging).
> 
> What happened to the idea of adding new functions at the level of the
> call & cast functions we have now, that work with at-least-once instead
> of at-most-once semantics? Yes this is a different sort of use case, but
> it's still "messaging".


The idea I think is dead. Joshua essentially told the reasons in the previous message.

Renat Akhmerov
@Nokia

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20160607/bbc3db5d/attachment.html>


More information about the OpenStack-dev mailing list