[openstack-dev] [Mistral][TaskFlow] Long running actions

Joshua Harlow harlowja at yahoo-inc.com
Wed Apr 2 05:07:27 UTC 2014


Get er' done!

Haha, u guys want to jump into #openstack-state-management tommorow or on our Thursday meeting we can discuss more how this might work and such.

That'd be cool.

Sent from my really tiny device...

On Apr 1, 2014, at 9:37 PM, "Renat Akhmerov" <rakhmerov at mirantis.com<mailto:rakhmerov at mirantis.com>> wrote:

On 02 Apr 2014, at 06:00, Joshua Harlow <harlowja at yahoo-inc.com<mailto:harlowja at yahoo-inc.com>> wrote:

More inline.

From: Dmitri Zimine <dz at stackstorm.com<mailto:dz at stackstorm.com>>
Reply-To: "OpenStack Development Mailing List (not for usage questions)" <openstack-dev at lists.openstack.org<mailto:openstack-dev at lists.openstack.org>>
Date: Tuesday, April 1, 2014 at 2:59 PM
To: "OpenStack Development Mailing List (not for usage questions)" <openstack-dev at lists.openstack.org<mailto:openstack-dev at lists.openstack.org>>
Subject: Re: [openstack-dev] [Mistral][TaskFlow] Long running actions


On Apr 1, 2014, at 3:43 AM, Renat Akhmerov <rakhmerov at mirantis.com<mailto:rakhmerov at mirantis.com>> wrote:
On 25 Mar 2014, at 01:51, Joshua Harlow <harlowja at yahoo-inc.com<mailto:harlowja at yahoo-inc.com>> wrote:

The first execution model I would call the local execution model, this model involves forming tasks and flows and then executing them inside an application, that application is running for the duration of the workflow (although if it crashes it can re-establish the task and flows that it was doing and attempt to resume them). This could also be what openstack projects would call the 'conductor' approach where nova, ironic, trove have a conductor which manages these long-running actions (the conductor is alive/running throughout the duration of these workflows, although it may be restarted while running). The restarting + resuming part is something that openstack hasn't handled so gracefully currently, typically requiring either some type of cleanup at restart (or by operations), with taskflow using this model the resumption part makes it possible to resume from the last saved state (this connects into the persistence model that taskflow uses, the state transitions, how execution occurrs itself...).

The second execution model is an extension of the first, whereby there is still a type of 'conductor' that is managing the life-time of the workflow, but instead of locally executing tasks in the conductor itself tasks are now executed on remote-workers (see http://tinyurl.com/lf3yqe4
). The engine currently still is 'alive' for the life-time of the execution, although the work that it is doing is relatively minimal (since its not actually executing any task code, but proxying those requests to others works). The engine while running does the conducting of the remote-workers (saving persistence details, doing state-transtions, getting results, sending requests to workers…).

These two execution models are special cases of what you call “lazy execution model” (or passive as we call it). To illustrate this idea we can take a look at the first sequence diagram at [0], we basically will see the following interaction:

1) engine --(task)--> queue --(task)--> worker
2) execute task
3) worker --(result)--> queue --(result)--> engine

This is how TaskFlow worker based model works.

If we loosen the requirement in 3) and assume that not only worker can send a task result back to engine we’ll get our passive model. Instead of worker it can be anything else (some external system) that knows how to make this call. A particular way is not too important, it can be a direct message or it can be hidden behind an API method. In Mistral it’s now a REST API method however we’re about to decouple engine from REST API so that engine is a standalone process and listens to a queue. So worker-based model is basically the same with the only strict requirement that only worker sends a result back.

In order to implement local execution model on top of “lazy execution model” we just need to abstract a transport (queue) so that we can use an in-process transport. That’s it. It’s what Mistral already has implemented. Again, we see that “lazy execution model” is more universal.

IMO this “lazy execution model” should be the main execution model that TaskFlow supports, others can be easily implemented on top of it. But the opposite assertion is wrong. IMO this is the most important obstacle in all our discussions, the reason why we don’t always understand each other well enough. I know it may be a lot of work to shift a paradigm in TaskFlow team but if we did that we would get enough freedom for using TaskFlow in lots of cases.

Let me know what you think. I might have missed something.

DZ: Interesting idea! So that other models of execution are based on lazy execution model? TaskFlow implements this, we can use it, and for other clients more convenient higher level execution models are provided? Interesting. Makes sense.
@Joshua? @Kirill? Others?

I think this is likely possible, which is simiar to whats in http://tinyurl.com/k3s2gmy, engine types can be built from each other (and if we wanted to alter the structure that exists in taskflow) then sure. But see that message for more of my concerns around exposing that engine API to library users (I think it could have its usage in mistral to expose this, but I'm not sure its useful for elsewhere, and once its public engine API, its public for a very long time).

What are we waiting for? Let’s code it up! :)

Renat Akhmerov
@ Mirantis Inc.

_______________________________________________
OpenStack-dev mailing list
OpenStack-dev at lists.openstack.org<mailto:OpenStack-dev at lists.openstack.org>
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20140402/75acacef/attachment.html>


More information about the OpenStack-dev mailing list