[openstack-dev] [oslo] Asyncio and oslo.messaging

Clint Byrum clint at fewbar.com
Wed Jul 9 15:39:33 UTC 2014


Excerpts from Yuriy Taraday's message of 2014-07-09 03:36:00 -0700:
> On Tue, Jul 8, 2014 at 11:31 PM, Joshua Harlow <harlowja at yahoo-inc.com>
> wrote:
> 
> > I think clints response was likely better than what I can write here, but
> > I'll add-on a few things,
> >
> >
> > >How do you write such code using taskflow?
> > >
> > >  @asyncio.coroutine
> > >  def foo(self):
> > >      result = yield from some_async_op(...)
> > >      return do_stuff(result)
> >
> > The idea (at a very high level) is that users don't write this;
> >
> > What users do write is a workflow, maybe the following (pseudocode):
> >
> > # Define the pieces of your workflow.
> >
> > TaskA():
> >   def execute():
> >       # Do whatever some_async_op did here.
> >
> >   def revert():
> >       # If execute had any side-effects undo them here.
> >
> > TaskFoo():
> >    ...
> >
> > # Compose them together
> >
> > flow = linear_flow.Flow("my-stuff").add(TaskA("my-task-a"),
> > TaskFoo("my-foo"))
> >
> 
> I wouldn't consider this composition very user-friendly.
> 

I find it extremely user friendly when I consider that it gives you
clear lines of delineation between "the way it should work" and "what
to do when it breaks."

> > # Submit the workflow to an engine, let the engine do the work to execute
> > it (and transfer any state between tasks as needed).
> >
> > The idea here is that when things like this are declaratively specified
> > the only thing that matters is that the engine respects that declaration;
> > not whether it uses asyncio, eventlet, pigeons, threads, remote
> > workers[1]. It also adds some things that are not (imho) possible with
> > co-routines (in part since they are at such a low level) like stopping the
> > engine after 'my-task-a' runs and shutting off the software, upgrading it,
> > restarting it and then picking back up at 'my-foo'.
> >
> 
> It's absolutely possible with coroutines and might provide even clearer
> view of what's going on. Like this:
> 
> @asyncio.coroutine
> def my_workflow(ctx, ...):
>     project = yield from ctx.run_task(create_project())
>     # Hey, we don't want to be linear. How about parallel tasks?
>     volume, network = yield from asyncio.gather(
>         ctx.run_task(create_volume(project)),
>         ctx.run_task(create_network(project)),
>     )
>     # We can put anything here - why not branch a bit?
>     if create_one_vm:
>         yield from ctx.run_task(create_vm(project, network))
>     else:
>         # Or even loops - why not?
>         for i in range(network.num_ips()):
>             yield from ctx.run_task(create_vm(project, network))
> 

Sorry but the code above is nothing like the code that Josh shared. When
create_network(project) fails, how do we revert its side effects? If we
want to resume this flow after reboot, how does that work?

I understand that there is a desire to write everything in beautiful
python yields, try's, finally's, and excepts. But the reality is that
python's stack is lost the moment the process segfaults, power goes out
on that PDU, or the admin rolls out a new kernel.

We're not saying "asyncio vs. taskflow". I've seen that mistake twice
already in this thread. Josh and I are suggesting that if there is a
movement to think about coroutines, there should also be some time spent
thinking at a high level: "how do we resume tasks, revert side effects,
and control flow?"

If we embed taskflow deep in the code, we get those things, and we can
treat tasks as coroutines and let taskflow's event loop be asyncio just
the same. If we embed asyncio deep into the code, we don't get any of
the high level functions and we get just as much code churn.

> There's no limit to coroutine usage. The only problem is the library that
> would bind everything together.
> In my example run_task will have to be really smart, keeping track of all
> started tasks, results of all finished ones, skipping all tasks that have
> already been done (and substituting already generated results).
> But all of this is doable. And I find this way of declaring workflows way
> more understandable than whatever would it look like with Flow.add's
> 

The way the flow is declared is important, as it leads to more isolated
code. The single place where the flow is declared in Josh's example means
that the flow can be imported, the state deserialized and inspected,
and resumed by any piece of code: an API call, a daemon start up, an
admin command, etc.

I may be wrong, but it appears to me that the context that you built in
your code example is hard, maybe impossible, to resume after a process
restart unless _every_ task is entirely idempotent and thus can just be
repeated over and over.



More information about the OpenStack-dev mailing list