[openstack-dev] [oslo][mistral] Saga of process than ack and where can we go from here...
Joshua Harlow
harlowja at fastmail.com
Fri Jun 3 16:14:05 UTC 2016
Deja, Dawid wrote:
> On Thu, 2016-05-05 at 11:08 +0700, Renat Akhmerov wrote:
>>
>>> On 05 May 2016, at 01:49, Mehdi Abaakouk <sileht at sileht.net
>>> <mailto:sileht at sileht.net>> wrote:
>>>
>>>
>>> Le 2016-05-04 10:04, Renat Akhmerov a écrit :
>>>> No problem. Let’s not call it RPC (btw, I completely agree with that).
>>>> But it’s one of the messaging patterns and hence should be under
>>>> oslo.messaging I guess, no?
>>>
>>> Yes and no, we currently have two APIs (rpc and notification). And
>>> personally I regret to have the notification part in oslo.messaging.
>>>
>>> RPC and Notification are different beasts, and both are today limited
>>> in terms of feature because they share the same driver implementation.
>>>
>>> Our RPC errors handling is really poor, for example Nova just put
>>> instance in ERROR when something bad occurs in oslo.messaging layer.
>>> This enforces deployer/user to fix the issue manually.
>>>
>>> Our Notification system doesn't allow fine grain routing of message,
>>> everything goes into one configured topic/queue.
>>>
>>> And now we want to add a new one... I'm not against this idea,
>>> but I'm not a huge fan.
>>>
>>>>>>> Thoughts from folks (mistral and oslo)?
>>>>> Also, I was not at the Summit, should I conclude the Tooz+taskflow
>>>>> approach (that ensure the idempotent of the application within the
>>>>> library API) have not been accepted by mistral folks ?
>>>> Speaking about idempotency, IMO it’s not a central question that we
>>>> should be discussing here. Mistral users should have a choice: if they
>>>> manage to make their actions idempotent it’s excellent, in many cases
>>>> idempotency is certainly possible, btw. If no, then they know about
>>>> potential consequences.
>>>
>>> You shouldn't mix the idempotency of the user task and the idempotency
>>> of a Mistral action (that will at the end run the user task).
>>> You can have your Mistral task runner implementation idempotent and just
>>> make the workflow to use configurable in case the user task is
>>> interrupted or badly finished even if the user task is idempotent or not.
>>> This makes the thing very predictable. You will know for example:
>>> * if the user task has started or not,
>>> * if the error is due to a node power cut when the user task runs,
>>> * if you can safely retry a not idempotent user task on an other node,
>>> * you will not be impacted by rabbitmq restart or TCP connection issues,
>>> * ...
>>>
>>> With the oslo.messaging approach, everything will just end up in a
>>> generic MessageTimeout error.
>>>
>>> The RPC API already have this kind of issue. Applications have
>>> unfortunately
>>> dealt with that (and I think they want something better now).
>>> I'm just not convinced we should add a new "working queue" API in
>>> oslo.messaging for tasks scheduling that have the same issue we already
>>> have with RPC.
>>>
>>> Anyway, that's your choice, if you want rely on this poor structure,
>>> I will
>>> not be against, I'm not involved in Mistral. I just want everybody is
>>> aware
>>> of this.
>>>
>>>> And even in this case there’s usually a number
>>>> of measures that can be taken to mitigate those consequences (reruning
>>>> workflows from certain points after manually fixing problems, rollback
>>>> scenarios etc.).
>>>
>>> taskflow allows to describe and automate this kind of workflow really
>>> easily.
>>>
>>>> What I’m saying is: let’s not make that crucial decision now about
>>>> what a messaging framework should support or not, let’s make it more
>>>> flexible to account for variety of different usage scenarios.
>>>
>>> I think the confusion is in the "messaging" keyword, currently
>>> oslo.messaging
>>> is a "RPC" framework and a "Notification" framework on top of 'messaging'
>>> frameworks.
>>>
>>> Messaging framework we uses are 'kombu', 'pika', 'zmq' and 'pingus'.
>>>
>>>> It’s normal for frameworks to give more rather than less.
>>>
>>> I disagree, here we mix different concepts into one library, all concepts
>>> have to be implemented by different 'messaging framework',
>>> So we fortunately give less to make thing just works in the same way
>>> with all
>>> drivers for all APIs.
>>>
>>>> One more thing, at the summit we were discussing the possibility to
>>>> define at-most-once/at-least-once individually for Mistral tasks. This
>>>> is demanded because there cases where we need to do it, advanced users
>>>> may choose one or another depending on a task/action semantics.
>>>> However, it won’t be possible to implement w/o changes in the
>>>> underlying messaging framework.
>>>
>>> If we goes that way, oslo.messaging users and Mistral users have to
>>> be aware
>>> that their job/task/action/whatever will perhaps not be called
>>> (at-most-once)
>>> or perhaps called twice (at-least-once).
>>>
>>> The oslo.messaging/Mistral API and docs must be clear about this
>>> behavior to
>>> not having bugs open against oslo.messaging because script written
>>> via Mistral
>>> API is not executed as expected "sometimes".
>>> "sometimes" == when deployers have trouble with its rabbitmq (or
>>> whatever)
>>> broker and even just when a deployer restart a broker node or when a TCP
>>> issue occurs. At this end the backtrace in theses cases always trows only
>>> oslo.messaging trace (the well known MessageTimeout...).
>>>
>>>
>>> Also oslo.messaging is already a fragile brick used by everybody that
>>> a very small subset of people maintain (thanks to them).
>>>
>>> I'm afraid that adding such new API will increase the needed
>>> maintenance for this lib while currently not many people care about
>>> (the whole lib not the new API).
>>>
>>> I also wonder if other project have the same needs (that always help
>>> to design a new API).
>>
>> Mehdi,
>>
>> What are you proposing? Can you confirm that we should be just dealing
>> with this problem on our own in Mistral? If so, that works well for
>> us. Initially we didn’t want to switch to oslo.messaging from direct
>> access to RabbitMQ for this and also other reasons. But we got a
>> strong feedback from the community that said “you guys need to reuse
>> technologies from the community and hence switch to oslo.messaging”.
>> So we did, assuming that we would fix all needed issues in
>> oslo.messaging relatively soon. Now it’s been ~2 years since then and
>> we keep struggling with all that stuff.
>>
>> When I see these discussions again and again where people try to
>> convince that at-least-one delivery is a bad thing I can’t participate
>> in them anymore. We spent a lot of time thinking about it and
>> experimenting with it and know all pros and cons.
>>
>> Renat Akhmerov
>> @Nokia
>
> Maybe this could be resolved in oslo.messaging by following one of
> Python slogans /we are all responsible users here/ [1].
>
> What I'm proposing is to let the consumer of the message decide when to
> send ACK, because it knows best when to do so. I can think of scenarios
> when it is required to send ACK in a middle of message process e.g.
> after receiving message I want to store it in the DB before sending an
> ACK and send it when message is safely stored. Having that we could
> implement whatever delivery model we want in mistral on top of
> oslo.messaging.
From my understanding (and some of the oslo.messaging folks can correct
me if I am wrong); but they (the oslo.messaging maintainers) don't feel
comfortable allowing such a option to be made possible because of how
doing such a thing alters the principles of oslo.messaging and increases
the complexity of the code-base (and subsequent testing, bug reports,
feature support that come along with enabling such a thing).
Thus why I think the preference was to have this model (which isn't
really the `rpc` kind of model that oslo.messaging has been targeting at
that point, but is more like a work-queue) be in another library with a
clear API that explicitly is targeted at this kind of model. Thus
instead of having a multi-personality codebase with hidden features like
this (say in oslo.messaging) instead it gets its own codebase and API
that is 'just right' (or more close to being 'right') for it's concept
(vs trying to stuff it into oslo.messaging).
>
> [1] https://en.wikipedia.org/wiki/Python_syntax_and_semantics#Objects
>
> Thanks,
> Dawid Deja
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
More information about the OpenStack-dev
mailing list