<html><head><meta http-equiv="Content-Type" content="text/html charset=windows-1252"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; ">Inline...<br><div><div>On Mar 27, 2014, at 5:10 PM, Joshua Harlow <<a href="mailto:harlowja@yahoo-inc.com">harlowja@yahoo-inc.com</a>> wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite">

<meta http-equiv="Content-Type" content="text/html; charset=Windows-1252">

<div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;">

<div style="font-family: Calibri, sans-serif; font-size: 14px; ">

Thanks for the description!</div>

<div style="font-family: Calibri, sans-serif; font-size: 14px; ">

<br>

</div>

<div style="font-family: Calibri, sans-serif; font-size: 14px; ">

The steps here seem very much like what a taskflow engine does (which is good).</div>

<div style="font-family: Calibri, sans-serif; font-size: 14px; ">

<br>

</div>

<div style="font-family: Calibri, sans-serif; font-size: 14px; ">

To connect this to how I think could work in taskflow.</div>

<ol>

<li><font face="Calibri,sans-serif">Someone creates tasks/flows describing the work

<i>to-be-done</i> (converting a DSL -> taskflow tasks/flows/retry[1] objects</font><font face="Calibri,sans-serif">…)</font></li><li><font face="Calibri,sans-serif">On execute(workflow) </font><span style="font-family: Calibri; ">engine creates a new workflow execution, computes the first batch of tasks, creates executor for those tasks (remote, local</span>…) and

 executes those tasks.</li><li>Waits for response back from <a href="http://docs.python.org/dev/library/concurrent.futures.html">

futures</a> returned from executor.</li><li>Receives futures responses (or receives new response <i>DELAY</i>, for example), or exceptions…</li><li>Continues sending out batches of tasks that can be still be executing (aka tasks that don't have dependency on output of delayed tasks).</li><li>If any delayed tasks after repeating #2-5 as many times as it can, the engine will shut itself down (see <a href="http://tinyurl.com/l3x3rrb">http://tinyurl.com/l3x3rrb</a>).</li></ol></div></blockquote>Why would engine treat long running tasks differently? The model Mistral tried out is the engine sends the batch of tasks and goes asleep; the 'worker/executor' is calling engine back when the task(s) complete. Can it be applied <br><blockquote type="cite"><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;"><ol start="7"><li>On delay task finishing some API/webhook/other (the mechanism imho shouldn't be tied to webhooks, at least not in taskflow, but should be left up to the user of taskflow to decide how to accomplish this) will be/must be responsible for resuming the engine

 and setting the result for the previous delayed task.</li></ol></div></blockquote>Oh no, webhook is the way to expose it to 3rd party system. From the library standpoint it's just an API call. </div><div><br></div><div>One can do it even now by getting the appropriate Flow_details, instantiating and engine (flow, flow_details) and running it to continue from where it left out. Is it how you mean it? But I keep on dreaming of a passive version of TaskFlow engine which treats all tasks the same and exposes one extra method - handle_tasks. </div><div><blockquote type="cite"><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;"><ol start="8"><li>Repeat 2 -> 7 until all tasks have executed/failed.</li><li>Profit!</li></ol></div></blockquote></div><div><blockquote type="cite"><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;">

<div>This seems like it could be accomplished, although there are race conditions in the #6 (what if multiple delayed requests are received at the same time)? What locking is done to ensure that this doesn't cause conflicts? </div></div></blockquote>Engine must handle concurrent calls of mutation methods - start, stop, handle_action. How -  differs depending on engine running in multiple threads or in event loop on queues of calls. </div><div><br><blockquote type="cite"><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;"><div>Does the POC solve that part (no

 simultaneous step #5 from below)? </div></div></blockquote>Yes although we may want to revisit the current solution. </div><div><br><blockquote type="cite"><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;"><div>There was a mention of a watch-dog (ideally to ensure that delayed tasks can't just sit around forever), was that implemented?</div></div></blockquote>If _delayed_ tasks and 'normal' tasks are treat alike, this is just a matter of timeout as a generic property on a task. So Mistral didn't have to have it. For the proposal above, a separate treatment is necessary for _delayed_ tasks. </div><div><blockquote type="cite"><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;">

<div><br>

</div>

<div>[1] <a href="https://wiki.openstack.org/wiki/TaskFlow#Retries">https://wiki.openstack.org/wiki/TaskFlow#Retries</a> (new feature!)</div></div></blockquote>This is nice. I would call it a 'repeater': running a sub flow several times with various data for various reasons is reacher then 'retry'. </div><div>What about the 'retry policy' on individual task? </div><div><br><blockquote type="cite"><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;">

<div><br>

</div>

<span id="OLK_SRC_BODY_SECTION" style="font-family: Calibri, sans-serif; font-size: 14px; ">

<div style="font-family: Calibri; font-size: 11pt; text-align: left; border-width: 1pt medium medium; border-style: solid none none; padding: 3pt 0in 0in; border-top-color: rgb(181, 196, 223); ">

<span style="font-weight:bold">From: </span>Dmitri Zimine <<a href="mailto:dz@stackstorm.com">dz@stackstorm.com</a>><br>

<span style="font-weight:bold">Reply-To: </span>"OpenStack Development Mailing List (not for usage questions)" <<a href="mailto:openstack-dev@lists.openstack.org">openstack-dev@lists.openstack.org</a>><br>

<span style="font-weight:bold">Date: </span>Thursday, March 27, 2014 at 4:43 PM<br>

<span style="font-weight:bold">To: </span>"OpenStack Development Mailing List (not for usage questions)" <<a href="mailto:openstack-dev@lists.openstack.org">openstack-dev@lists.openstack.org</a>><br>

<span style="font-weight:bold">Subject: </span>[openstack-dev] [Mistral] How Mistral handling long running delegate tasks<br>

</div>

<div><br>

</div>

<blockquote id="MAC_OUTLOOK_ATTRIBUTION_BLOCKQUOTE" style="BORDER-LEFT: #b5c4df 5 solid; PADDING:0 0 0 5; MARGIN:0 0 0 5;" type="cite">

<div>

<div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; ">

Following up on <span style="white-space: pre-wrap; "><a href="http://tinyurl.com/l8gtmsw">http://tinyurl.com/l8gtmsw</a></span> and <span style="white-space: pre-wrap; "><a href="http://tinyurl.com/n3v9lt8:">http://tinyurl.com/n3v9lt8:</a></span> this explains

 how Mistral handles long running delegate tasks. Note that a 'passive' workflow engine can handle both normal tasks and delegates the same way. I'll also put that on ActionDesign wiki, after discussion.

<div><font face="Calibri"><span style="font-size: 13px;"><br>

</span></font></div>

<div><font face="Calibri"><span style="font-size: 13px;"><b>Diagram: </b></span></font></div>

<div>

<div style="margin: 0px; "><a href="https://docs.google.com/a/stackstorm.com/drawings/d/147_EpdatpN_sOLQ0LS07SWhaC3N85c95TkKMAeQ_a4c/edit?usp=sharing" style="font-size: 13px;"><font face="Calibri">https://docs.google.com/a/stackstorm.com/drawings/d/147_EpdatpN_sOLQ0LS07SWhaC3N85c95TkKMAeQ_a4c/edit?usp=sharing</font></a></div>

<div style="margin: 0px; min-height: 22px; "><font face="Calibri"><span style="font-size: 13px;"><br>

</span></font></div>

<div style="margin: 0px; "><font face="Calibri"><span style="font-size: 13px;">1. On start(workflow), engine creates a new workflow execution, computes the first batch of tasks, sends them to ActionRunner [1].</span></font></div>

<div style="margin: 0px; "><font face="Calibri"><span style="font-size: 13px;">2. ActionRunner creates an action and calls action.run(input)</span></font></div>

<div style="margin: 0px; "><font face="Calibri"><span style="font-size: 13px;">3. Action does the work (compute (10!)), produce the results,  and return the results to executor. If it returns, status=SUCCESS. If it fails it throws exception, status=ERROR.</span></font></div>

<div style="margin: 0px; "><font face="Calibri"><span style="font-size: 13px;">4. ActionRunner notifies Engine that the task is complete task_done(execution, task, status, results)[2]</span></font></div>

<div style="margin: 0px; "><font face="Calibri"><span style="font-size: 13px;">5. Engine computes the next task(s) ready to trigger, according to control flow and data flow, and sends them to ActionRunner.</span></font></div>

<div style="margin: 0px; "><font face="Calibri"><span style="font-size: 13px;">6. Like step 2: ActionRunner calls the action's run(input)</span></font></div>

<div style="margin: 0px; "><font face="Calibri"><span style="font-size: 13px;">7. A delegate action doesn't produce results: it calls out the 3rd party system, which is expected to make a callback to a workflow service with the results. It returns to ActionRunner

 without results, "immediately".  </span></font></div>

<div style="margin: 0px; "><font face="Calibri"><span style="font-size: 13px;">8. ActionRunner marks status=RUNNING [?]</span></font></div>

<div style="margin: 0px; "><font face="Calibri"><span style="font-size: 13px;">9. 3rd party system takes 'long time' == longer then any system component can be assumed to stay alive. </span></font></div>

<div style="margin: 0px; "><font face="Calibri"><span style="font-size: 13px;">10. 3rd party component calls Mistral WebHook which resolves to engine.task_complete(workbook, id, status, results)  </span></font></div>

<div style="margin: 0px; min-height: 22px; "><font face="Calibri"><span style="font-size: 13px;"><br>

</span></font></div>

<div style="margin: 0px; "><b style="font-size: 13px;"><font face="Calibri">Comments: </font></b></div>

<div style="margin: 0px; "><font face="Calibri"><span style="font-size: 13px;">* One Engine handles multiple executions of multiple workflows. It exposes two main operations: start(workflow) and task_complete(execution, task, status, results), and responsible

 for defining the next batch of tasks based on control flow and data flow. Engine is passive - it runs in a hosts' thread. Engine and ActionRunner communicate via task queues asynchronously, for details, see 

<a href="https://wiki.openstack.org/wiki/Mistral/POC">https://wiki.openstack.org/wiki/Mistral/POC</a> </span></font></div>

<div style="margin: 0px; "><font face="Calibri"><span style="font-size: 13px;"><br>

</span></font></div>

<div style="margin: 0px; "><font face="Calibri"><span style="font-size: 13px;">* Engine doesn't distinct sync and async actions, it doesn't deal with Actions at all. It only reacts to task completions, handling the results, updating the state, and queuing next

 set of tasks.</span></font></div>

<div style="margin: 0px; min-height: 22px; "><font face="Calibri"><span style="font-size: 13px;"><br>

</span></font></div>

<div style="margin: 0px; "><font face="Calibri"><span style="font-size: 13px;">* Only Action can know and define if it is a delegate or not. Some protocol required to let ActionRunner know that the action is not returning the results immediately. A convention

 of returning None may be sufficient. </span></font></div>

<div style="margin: 0px; min-height: 22px; "><font face="Calibri"><span style="font-size: 13px;"><br>

</span></font></div>

<div style="margin: 0px; "><font face="Calibri"><span style="font-size: 13px;">* Mistral exposes  engine.task_done in the REST API so 3rd party systems can call a web hook.</span></font></div>

</div>

<div style="margin: 0px; "><font face="Calibri"><span style="font-size: 13px;"><br>

</span></font></div>

<div style="margin: 0px; "><font face="Calibri"><span style="font-size: 13px;">DZ.</span></font></div>

<div style="margin: 0px; "><font face="Calibri"><span style="font-size: 13px;"><br>

</span></font></div>

<div style="margin: 0px; ">

<div style="margin: 0px; "><font face="Calibri"><span style="font-size: 13px;">[1]  I use ActionRunner instead of Executor (current name) to avoid confusion: it is Engine which is responsible for executions, and ActionRunner only runs actions. We should rename

 it in the code.</span></font></div>

<div style="margin: 0px; "><font face="Calibri"><span style="font-size: 13px;"><br>

</span></font></div>

<div style="margin: 0px; "><font face="Calibri"><span style="font-size: 13px;">[2] I use task_done for briefly and out of pure spite, in the code it is conveny_task_results.</span></font></div>

</div>

</div>

</div>

</blockquote>

</span>

</div>

</blockquote></div><br></body></html>