[openstack-dev] [oslo] Asyncio and oslo.messaging
Outlook
harlowja at outlook.com
Thu Jul 10 19:51:32 UTC 2014
On Jul 10, 2014, at 3:48 AM, Yuriy Taraday <yorik.sar at gmail.com> wrote:
> On Wed, Jul 9, 2014 at 7:39 PM, Clint Byrum <clint at fewbar.com> wrote:
> Excerpts from Yuriy Taraday's message of 2014-07-09 03:36:00 -0700:
> > On Tue, Jul 8, 2014 at 11:31 PM, Joshua Harlow <harlowja at yahoo-inc.com>
> > wrote:
> >
> > > I think clints response was likely better than what I can write here, but
> > > I'll add-on a few things,
> > >
> > >
> > > >How do you write such code using taskflow?
> > > >
> > > > @asyncio.coroutine
> > > > def foo(self):
> > > > result = yield from some_async_op(...)
> > > > return do_stuff(result)
> > >
> > > The idea (at a very high level) is that users don't write this;
> > >
> > > What users do write is a workflow, maybe the following (pseudocode):
> > >
> > > # Define the pieces of your workflow.
> > >
> > > TaskA():
> > > def execute():
> > > # Do whatever some_async_op did here.
> > >
> > > def revert():
> > > # If execute had any side-effects undo them here.
> > >
> > > TaskFoo():
> > > ...
> > >
> > > # Compose them together
> > >
> > > flow = linear_flow.Flow("my-stuff").add(TaskA("my-task-a"),
> > > TaskFoo("my-foo"))
> > >
> >
> > I wouldn't consider this composition very user-friendly.
> >
>
So just to make this understandable, the above is a declarative structure of the work to be done. I'm pretty sure it's general agreed[1] in the programming world that when declarative structures can be used they should be (imho openstack should also follow the same pattern more than it currently does). The above is a declaration of the work to be done and the ordering constraints that must be followed. Its just one of X ways to do this (feel free to contribute other variations of these 'patterns' @ https://github.com/openstack/taskflow/tree/master/taskflow/patterns).
[1] http://latentflip.com/imperative-vs-declarative/ (and many many others).
> I find it extremely user friendly when I consider that it gives you
> clear lines of delineation between "the way it should work" and "what
> to do when it breaks."
>
> So does plain Python. But for plain Python you don't have to explicitly use graph terminology to describe the process.
>
I'm not sure where in the above you saw graph terminology. All I see there is a declaration of a pattern that explicitly says run things 1 after the other (linearly).
> > > # Submit the workflow to an engine, let the engine do the work to execute
> > > it (and transfer any state between tasks as needed).
> > >
> > > The idea here is that when things like this are declaratively specified
> > > the only thing that matters is that the engine respects that declaration;
> > > not whether it uses asyncio, eventlet, pigeons, threads, remote
> > > workers[1]. It also adds some things that are not (imho) possible with
> > > co-routines (in part since they are at such a low level) like stopping the
> > > engine after 'my-task-a' runs and shutting off the software, upgrading it,
> > > restarting it and then picking back up at 'my-foo'.
> > >
> >
> > It's absolutely possible with coroutines and might provide even clearer
> > view of what's going on. Like this:
> >
> > @asyncio.coroutine
> > def my_workflow(ctx, ...):
> > project = yield from ctx.run_task(create_project())
> > # Hey, we don't want to be linear. How about parallel tasks?
> > volume, network = yield from asyncio.gather(
> > ctx.run_task(create_volume(project)),
> > ctx.run_task(create_network(project)),
> > )
> > # We can put anything here - why not branch a bit?
> > if create_one_vm:
> > yield from ctx.run_task(create_vm(project, network))
> > else:
> > # Or even loops - why not?
> > for i in range(network.num_ips()):
> > yield from ctx.run_task(create_vm(project, network))
> >
>
> Sorry but the code above is nothing like the code that Josh shared. When
> create_network(project) fails, how do we revert its side effects? If we
> want to resume this flow after reboot, how does that work?
>
> I understand that there is a desire to write everything in beautiful
> python yields, try's, finally's, and excepts. But the reality is that
> python's stack is lost the moment the process segfaults, power goes out
> on that PDU, or the admin rolls out a new kernel.
>
> We're not saying "asyncio vs. taskflow". I've seen that mistake twice
> already in this thread. Josh and I are suggesting that if there is a
> movement to think about coroutines, there should also be some time spent
> thinking at a high level: "how do we resume tasks, revert side effects,
> and control flow?"
>
> If we embed taskflow deep in the code, we get those things, and we can
> treat tasks as coroutines and let taskflow's event loop be asyncio just
> the same. If we embed asyncio deep into the code, we don't get any of
> the high level functions and we get just as much code churn.
+1 the declaration of what is to do is not connected to the how to do it. IMHO most projects (maybe outside of the underlying core ones, this could include oslo,messaging, since it's a RPC layer at the lowest level) don't care about anything but the workflow that they want to have execute successfully or not. To me that implies that those projects only care about said declaration (this seems to be a reasonable assumption as pretty much all of openstack apis are just workflows around a driver api that abstracts away vendors/open-source solutions).
>
> > There's no limit to coroutine usage. The only problem is the library that
> > would bind everything together.
> > In my example run_task will have to be really smart, keeping track of all
> > started tasks, results of all finished ones, skipping all tasks that have
> > already been done (and substituting already generated results).
> > But all of this is doable. And I find this way of declaring workflows way
> > more understandable than whatever would it look like with Flow.add's
> >
>
> The way the flow is declared is important, as it leads to more isolated
> code. The single place where the flow is declared in Josh's example means
> that the flow can be imported, the state deserialized and inspected,
> and resumed by any piece of code: an API call, a daemon start up, an
> admin command, etc.
>
> I may be wrong, but it appears to me that the context that you built in
> your code example is hard, maybe impossible, to resume after a process
> restart unless _every_ task is entirely idempotent and thus can just be
> repeated over and over.
>
> I must have not stressed this enough in the last paragraph. The point is to make run_task method very smart. It should do smth like this (yes, I'm better in Python than English):
>
> @asyncio.coroutine
> def run_task(self, task):
> task_id = yield from self.register_task(task)
> res = yield from self.get_stored_result(task_id)
> if res is not None:
> return res
> try:
> res = yield from task
> except Exception as exc:
> yield from self.store_error(task_id, exc)
> raise exc
> else:
> yield from self.store_result(task_id, res)
> return res
>
I think you just created https://github.com/openstack/taskflow/blob/master/taskflow/engines/action_engine/runner.py#L55 ;-)
Except nothing in the above code says about about 'yield from', or other, it just is a run loop that schedules things (currently using concurrent.futures), waits for them to complete and then does things with there results (aka the completion part). I even had a note about the 1 place (potentially more) that asyncio would plug-in: https://github.com/openstack/taskflow/blob/master/taskflow/engines/action_engine/runner.py#L95 (since asyncio has a concurrent.futures equivalent, in fact exactly the same, which imho is a peculiarity of the python stdlib right now, aka the nearly identical code bases that are asyncio.futures and concurrent.futures).
> So it won't run any task more then once (unless smth very bad happens between task run and DB update, but noone is safe from that) and it will remember all errors that happened before.
> On the outer level all tasks will get stored in the context and if some error occurs, they will just get reverted e.g. by calling their revert() method in reverse loop.
>
> how do we resume tasks, revert side effects, and control flow?
>
> This would allow to resume tasks, revert side effects on error and provide way better way of describing control flow (plain Python instead new .add() language).
> Declaration of control flow is also just a Python method, nothing else. So you can import it anywhere, start the same task on one node, stop it and continue running it on another one.
>
> I'm not suggesting that taskflow is useless and asyncio is better (apple vs oranges). I'm saying that using coroutines (asyncio) can improve ways we can use taskflow and provide clearer method of developing these flows.
> This was mostly response to the "this is impossible with coroutines". I say it is possible and it can even be better.
> --
Agreed, to me it's a comparison of an engine (asyncio) and a car (taskflow), the car has a fuel-tank (the storage layer), a body & wheels (the workflow structure), a transmission (the runner from above) and also an engine (asyncio or other...). So the comparison to me isn't a comparison about better or worse since you can't compare an engine to a car. I totally expect the engine/s to be able to be asyncio based (as pointed out in clints response); even if it's right now concurrent.futures based; I think we can have more of a detailed specification here on how that would work but hopefully the idea is more clear now.
The following might help visualize this (for those that are more visual types): https://wiki.openstack.org/wiki/File:Core.png
Anyways other peculiarities that I believe coroutines would have, other thoughts would be interesting to hear...
1. How would a coroutine structure handle resumption? When the call-graph is the structure (aka in taskflow terminology a 'pattern') and when a coroutine does a 'yield from' in the middle of its execution (to say start another section of the workflow) and then the machine dies that was running this work, upon restart how does the running coroutine (run_task for example could be the thing executing all this) get back into the middle of that coroutine? Since a coroutine can yield multiple exit points (that coroutine can also store local state in variables and other objects), how would the run_task get back into one of those with the prior state? This seems problematic if the workflow structure is the coroutine call-graph structure. If the solution is to have coroutines chained together and no multiple yield points are allowed (or only 1 yield point is allowed) then it seems you just recreated the existing taskflow pattern ;)
a) This is imho even a bigger problem when you start running those coroutines on remote machines, since taskflow has awareness of the declared workflow (not necessarily implemented by a graph, which is why the core.png has a 'cloud' around the compilation) it can request that remote workers[2] execute the components of that workflow (and receive there results and repeat...) in parallel. This is one thing that I don't understand how the call-graph as the structure would actually achieve (if the declared workflow is embedded in the call graph there is no visibility into it to actually do this). This feature is key to scale imho (and is pretty commonly done in other parallel frameworks, via message passing, or under other names, note that [3, 4] and other libraries do similar things).
2. Introspection, I hope this one is more obvious. When the coroutine call-graph is the workflow there is no easy way to examine it before it executes (and change parts of it for example before it executes). This is a nice feature imho when it's declaratively and explicitly defined, you get the ability to do this. This part is key to handling upgrades that typically happen (for example the a workflow with the 5th task was upgraded to a newer version, we need to stop the service, shut it off, do the code upgrade, restart the service and change 5th task from v1 to v1.1).
3. Dataflow: tasks in taskflow can not just declare workflow dependencies but also dataflow dependencies (this is how tasks transfer things from one to another). I suppose the dataflow dependency would mirror to coroutine variables & arguments (except the variables/arguments would need to be persisted somewhere so that it can be passed back in on failure of the service running that coroutine). How is that possible without an abstraction over those variables/arguments (a coroutine can't store these things in local variables since those will be lost)?It would seem like this would need to recreate the persistence & storage layer[5] that taskflow already uses for this purpose to accomplish this.
[2] http://docs.openstack.org/developer/taskflow/workers.html
[3] http://python-rq.org/
[4] http://pythonhosted.org/pulsar/apps/tasks.html
[5] http://docs.openstack.org/developer/taskflow/persistence.html
Anyway it would be neat to try to make a 'coro' pattern in taskflow and have the existing engine from above work with those types of coroutines seamlessly (I expect this to actually be pretty easy) and let users who want to go about this path try it (as long as they understand the drawbacks this imho brings) then this could be an avenue to enabling both approaches (either way has its own set of drawbacks and advantages). That way the work done for coroutines is at one place and not scattered all over, this then gains all the projects a declarative workflow (where those projects feel this is applicable) that they can use to reliably (and easily scale it) to execute things.
-Josh
>
> Kind regards, Yuriy.
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20140710/9bc846b6/attachment.html>
More information about the OpenStack-dev
mailing list