<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Wed, Jul 9, 2014 at 7:39 PM, Clint Byrum <span dir="ltr"><<a href="mailto:clint@fewbar.com" target="_blank">clint@fewbar.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">


Excerpts from Yuriy Taraday's message of 2014-07-09 03:36:00 -0700:<br>

<div class="">> On Tue, Jul 8, 2014 at 11:31 PM, Joshua Harlow <<a href="mailto:harlowja@yahoo-inc.com">harlowja@yahoo-inc.com</a>><br>

> wrote:<br>

><br>

> > I think clints response was likely better than what I can write here, but<br>

> > I'll add-on a few things,<br>

> ><br>

> ><br>

> > >How do you write such code using taskflow?<br>

> > ><br>

> > >  @asyncio.coroutine<br>

> > >  def foo(self):<br>

> > >      result = yield from some_async_op(...)<br>

> > >      return do_stuff(result)<br>

> ><br>

> > The idea (at a very high level) is that users don't write this;<br>

> ><br>

> > What users do write is a workflow, maybe the following (pseudocode):<br>

> ><br>

> > # Define the pieces of your workflow.<br>

> ><br>

> > TaskA():<br>

> >   def execute():<br>

> >       # Do whatever some_async_op did here.<br>

> ><br>

> >   def revert():<br>

> >       # If execute had any side-effects undo them here.<br>

> ><br>

> > TaskFoo():<br>

> >    ...<br>

> ><br>

> > # Compose them together<br>

> ><br>

> > flow = linear_flow.Flow("my-stuff").add(TaskA("my-task-a"),<br>

> > TaskFoo("my-foo"))<br>

> ><br>

><br>

> I wouldn't consider this composition very user-friendly.<br>

><br>

<br>

</div>I find it extremely user friendly when I consider that it gives you<br>

clear lines of delineation between "the way it should work" and "what<br>

to do when it breaks."<br></blockquote><div><br></div><div>So does plain Python. But for plain Python you don't have to explicitly use graph terminology to describe the process.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">


<div><div class="h5">

> > # Submit the workflow to an engine, let the engine do the work to execute<br>

> > it (and transfer any state between tasks as needed).<br>

> ><br>

> > The idea here is that when things like this are declaratively specified<br>

> > the only thing that matters is that the engine respects that declaration;<br>

> > not whether it uses asyncio, eventlet, pigeons, threads, remote<br>

> > workers[1]. It also adds some things that are not (imho) possible with<br>

> > co-routines (in part since they are at such a low level) like stopping the<br>

> > engine after 'my-task-a' runs and shutting off the software, upgrading it,<br>

> > restarting it and then picking back up at 'my-foo'.<br>

> ><br>

><br>

> It's absolutely possible with coroutines and might provide even clearer<br>

> view of what's going on. Like this:<br>

><br>

> @asyncio.coroutine<br>

> def my_workflow(ctx, ...):<br>

>     project = yield from ctx.run_task(create_project())<br>

>     # Hey, we don't want to be linear. How about parallel tasks?<br>

>     volume, network = yield from asyncio.gather(<br>

>         ctx.run_task(create_volume(project)),<br>

>         ctx.run_task(create_network(project)),<br>

>     )<br>

>     # We can put anything here - why not branch a bit?<br>

>     if create_one_vm:<br>

>         yield from ctx.run_task(create_vm(project, network))<br>

>     else:<br>

>         # Or even loops - why not?<br>

>         for i in range(network.num_ips()):<br>

>             yield from ctx.run_task(create_vm(project, network))<br>

><br>

<br>

</div></div>Sorry but the code above is nothing like the code that Josh shared. When<br>

create_network(project) fails, how do we revert its side effects? If we<br>

want to resume this flow after reboot, how does that work?<br>

<br>

I understand that there is a desire to write everything in beautiful<br>

python yields, try's, finally's, and excepts. But the reality is that<br>

python's stack is lost the moment the process segfaults, power goes out<br>

on that PDU, or the admin rolls out a new kernel.<br>

<br>

We're not saying "asyncio vs. taskflow". I've seen that mistake twice<br>

already in this thread. Josh and I are suggesting that if there is a<br>

movement to think about coroutines, there should also be some time spent<br>

thinking at a high level: "how do we resume tasks, revert side effects,<br>

and control flow?"<br>

<br>

If we embed taskflow deep in the code, we get those things, and we can<br>

treat tasks as coroutines and let taskflow's event loop be asyncio just<br>

the same. If we embed asyncio deep into the code, we don't get any of<br>

the high level functions and we get just as much code churn.<br>

<div class=""><br>

> There's no limit to coroutine usage. The only problem is the library that<br>

> would bind everything together.<br>

> In my example run_task will have to be really smart, keeping track of all<br>

> started tasks, results of all finished ones, skipping all tasks that have<br>

> already been done (and substituting already generated results).<br>

> But all of this is doable. And I find this way of declaring workflows way<br>

> more understandable than whatever would it look like with Flow.add's<br>

><br>

<br>

</div>The way the flow is declared is important, as it leads to more isolated<br>

code. The single place where the flow is declared in Josh's example means<br>

that the flow can be imported, the state deserialized and inspected,<br>

and resumed by any piece of code: an API call, a daemon start up, an<br>

admin command, etc.<br>

<br>

I may be wrong, but it appears to me that the context that you built in<br>

your code example is hard, maybe impossible, to resume after a process<br>

restart unless _every_ task is entirely idempotent and thus can just be<br>

repeated over and over.</blockquote></div><div class="gmail_extra"><br></div>I must have not stressed this enough in the last paragraph. The point is to make run_task method very smart. It should do smth like this (yes, I'm better in Python than English):</div>


<div class="gmail_extra"><br></div><div class="gmail_extra"><font face="courier new, monospace">@asyncio.coroutine</font></div><div class="gmail_extra"><font face="courier new, monospace">def run_task(self, task):</font></div>


<div class="gmail_extra"><font face="courier new, monospace">    task_id = yield from self.register_task(task)</font></div><div class="gmail_extra"><font face="courier new, monospace">    res = yield from self.get_stored_result(task_id)<br>


</font></div><div class="gmail_extra"><font face="courier new, monospace">    if res is not None:</font></div><div class="gmail_extra"><font face="courier new, monospace">        return res</font></div><div class="gmail_extra">


<span style="font-family:'courier new',monospace">    try:</span><br></div><div class="gmail_extra"><font face="courier new, monospace">        res = yield from task</font></div><div class="gmail_extra"><font face="courier new, monospace">    except Exception as exc:</font></div>


<div class="gmail_extra"><font face="courier new, monospace">        yield from self.store_error(task_id, exc)</font></div><div class="gmail_extra"><font face="courier new, monospace">        raise exc</font></div><div class="gmail_extra">


<font face="courier new, monospace">    else:</font></div><div class="gmail_extra"><font face="courier new, monospace">        yield from self.store_result(task_id, res)</font></div><div class="gmail_extra"><font face="courier new, monospace">        return res</font></div>


<div class="gmail_extra"><br></div><div class="gmail_extra">So it won't run any task more then once (unless smth very bad happens between task run and DB update, but noone is safe from that) and it will remember all errors that happened before.</div>


<div class="gmail_extra">On the outer level all tasks will get stored in the context and if some error occurs, they will just get reverted e.g. by calling their <font face="courier new, monospace">revert()</font> method in reverse loop.</div>


<div class="gmail_extra"><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">how do we resume tasks, revert side effects, and control flow?</blockquote>


<div><br></div><div>This would allow to resume tasks, revert side effects on error and provide way better way of describing control flow (plain Python instead new .add() language).</div><div>Declaration of control flow is also just a Python method, nothing else. So you can import it anywhere, start the same task on one node, stop it and continue running it on another one.</div>


<div><br></div><div>I'm not suggesting that taskflow is useless and asyncio is better (apple vs oranges). I'm saying that using coroutines (asyncio) can improve ways we can use taskflow and provide clearer method of developing these flows.</div>


<div>This was mostly response to the "this is impossible with coroutines". I say it is possible and it can even be better.</div><div class="gmail_extra"><div>-- <br></div><br><div>Kind regards, Yuriy.</div>

</div></div>