[openstack-dev] [oslo] Asyncio and oslo.messaging
Yuriy Taraday
yorik.sar at gmail.com
Fri Jul 11 10:08:14 UTC 2014
On Thu, Jul 10, 2014 at 11:51 PM, Outlook <harlowja at outlook.com> wrote:
> On Jul 10, 2014, at 3:48 AM, Yuriy Taraday <yorik.sar at gmail.com> wrote:
>
> On Wed, Jul 9, 2014 at 7:39 PM, Clint Byrum <clint at fewbar.com> wrote:
>
>> Excerpts from Yuriy Taraday's message of 2014-07-09 03:36:00 -0700:
>> > On Tue, Jul 8, 2014 at 11:31 PM, Joshua Harlow <harlowja at yahoo-inc.com>
>> > wrote:
>> >
>> > > I think clints response was likely better than what I can write here,
>> but
>> > > I'll add-on a few things,
>> > >
>> > >
>> > > >How do you write such code using taskflow?
>> > > >
>> > > > @asyncio.coroutine
>> > > > def foo(self):
>> > > > result = yield from some_async_op(...)
>> > > > return do_stuff(result)
>> > >
>> > > The idea (at a very high level) is that users don't write this;
>> > >
>> > > What users do write is a workflow, maybe the following (pseudocode):
>> > >
>> > > # Define the pieces of your workflow.
>> > >
>> > > TaskA():
>> > > def execute():
>> > > # Do whatever some_async_op did here.
>> > >
>> > > def revert():
>> > > # If execute had any side-effects undo them here.
>> > >
>> > > TaskFoo():
>> > > ...
>> > >
>> > > # Compose them together
>> > >
>> > > flow = linear_flow.Flow("my-stuff").add(TaskA("my-task-a"),
>> > > TaskFoo("my-foo"))
>> > >
>> >
>> > I wouldn't consider this composition very user-friendly.
>> >
>>
>>
> So just to make this understandable, the above is a declarative structure
> of the work to be done. I'm pretty sure it's general agreed[1] in the
> programming world that when declarative structures can be used they should
> be (imho openstack should also follow the same pattern more than it
> currently does). The above is a declaration of the work to be done and the
> ordering constraints that must be followed. Its just one of X ways to do
> this (feel free to contribute other variations of these 'patterns' @
> https://github.com/openstack/taskflow/tree/master/taskflow/patterns).
>
> [1] http://latentflip.com/imperative-vs-declarative/ (and many many
> others).
>
I totally agree that declarative approach is better for workflow
declarations. I'm just saying that we can do it in Python with coroutines
instead. Note that declarative approach can lead to reinvention of entirely
new language and these "flow.add" can be the first step on this road.
> I find it extremely user friendly when I consider that it gives you
>> clear lines of delineation between "the way it should work" and "what
>> to do when it breaks."
>>
>
> So does plain Python. But for plain Python you don't have to explicitly
> use graph terminology to describe the process.
>
>
>
> I'm not sure where in the above you saw graph terminology. All I see there
> is a declaration of a pattern that explicitly says run things 1 after the
> other (linearly).
>
As long as workflow is linear there's no difference on whether it's
declared with .add() or with yield from. I'm talking about more complex
workflows like one I described in example.
> > > # Submit the workflow to an engine, let the engine do the work to
>> execute
>> > > it (and transfer any state between tasks as needed).
>> > >
>> > > The idea here is that when things like this are declaratively
>> specified
>> > > the only thing that matters is that the engine respects that
>> declaration;
>> > > not whether it uses asyncio, eventlet, pigeons, threads, remote
>> > > workers[1]. It also adds some things that are not (imho) possible with
>> > > co-routines (in part since they are at such a low level) like
>> stopping the
>> > > engine after 'my-task-a' runs and shutting off the software,
>> upgrading it,
>> > > restarting it and then picking back up at 'my-foo'.
>> > >
>> >
>> > It's absolutely possible with coroutines and might provide even clearer
>> > view of what's going on. Like this:
>> >
>> > @asyncio.coroutine
>> > def my_workflow(ctx, ...):
>> > project = yield from ctx.run_task(create_project())
>> > # Hey, we don't want to be linear. How about parallel tasks?
>> > volume, network = yield from asyncio.gather(
>> > ctx.run_task(create_volume(project)),
>> > ctx.run_task(create_network(project)),
>> > )
>> > # We can put anything here - why not branch a bit?
>> > if create_one_vm:
>> > yield from ctx.run_task(create_vm(project, network))
>> > else:
>> > # Or even loops - why not?
>> > for i in range(network.num_ips()):
>> > yield from ctx.run_task(create_vm(project, network))
>> >
>>
>
>> Sorry but the code above is nothing like the code that Josh shared. When
>> create_network(project) fails, how do we revert its side effects? If we
>> want to resume this flow after reboot, how does that work?
>>
>> I understand that there is a desire to write everything in beautiful
>> python yields, try's, finally's, and excepts. But the reality is that
>> python's stack is lost the moment the process segfaults, power goes out
>> on that PDU, or the admin rolls out a new kernel.
>>
>> We're not saying "asyncio vs. taskflow". I've seen that mistake twice
>> already in this thread. Josh and I are suggesting that if there is a
>> movement to think about coroutines, there should also be some time spent
>> thinking at a high level: "how do we resume tasks, revert side effects,
>> and control flow?"
>>
>> If we embed taskflow deep in the code, we get those things, and we can
>> treat tasks as coroutines and let taskflow's event loop be asyncio just
>> the same. If we embed asyncio deep into the code, we don't get any of
>> the high level functions and we get just as much code churn.
>>
>
> +1 the declaration of what is to do is not connected to the how to do it.
> IMHO most projects (maybe outside of the underlying core ones, this could
> include oslo,messaging, since it's a RPC layer at the lowest level) don't
> care about anything but the workflow that they want to have execute
> successfully or not. To me that implies that those projects only care about
> said declaration (this seems to be a reasonable assumption as pretty much
> all of openstack apis are just workflows around a driver api that abstracts
> away vendors/open-source solutions).
>
>
>> > There's no limit to coroutine usage. The only problem is the library
>> that
>> > would bind everything together.
>> > In my example run_task will have to be really smart, keeping track of
>> all
>> > started tasks, results of all finished ones, skipping all tasks that
>> have
>> > already been done (and substituting already generated results).
>> > But all of this is doable. And I find this way of declaring workflows
>> way
>> > more understandable than whatever would it look like with Flow.add's
>> >
>>
>> The way the flow is declared is important, as it leads to more isolated
>> code. The single place where the flow is declared in Josh's example means
>> that the flow can be imported, the state deserialized and inspected,
>> and resumed by any piece of code: an API call, a daemon start up, an
>> admin command, etc.
>>
>> I may be wrong, but it appears to me that the context that you built in
>> your code example is hard, maybe impossible, to resume after a process
>> restart unless _every_ task is entirely idempotent and thus can just be
>> repeated over and over.
>
>
> I must have not stressed this enough in the last paragraph. The point is
> to make run_task method very smart. It should do smth like this (yes, I'm
> better in Python than English):
>
> @asyncio.coroutine
> def run_task(self, task):
> task_id = yield from self.register_task(task)
> res = yield from self.get_stored_result(task_id)
> if res is not None:
> return res
> try:
> res = yield from task
> except Exception as exc:
> yield from self.store_error(task_id, exc)
> raise exc
> else:
> yield from self.store_result(task_id, res)
> return res
>
>
> I think you just created
> https://github.com/openstack/taskflow/blob/master/taskflow/engines/action_engine/runner.py#L55
> ;-)
>
No. If I understand it correctly, Runner runs workflows while my code is
for running one task and skipping it if it has already been run. It can
schedule tasks to remote nodes as well, for example.
> Except nothing in the above code says about about 'yield from', or other,
> it just is a run loop that schedules things (currently using
> concurrent.futures), waits for them to complete and then does things with
> there results (aka the completion part). I even had a note about the 1
> place (potentially more) that asyncio would plug-in:
> https://github.com/openstack/taskflow/blob/master/taskflow/engines/action_engine/runner.py#L95
> (since asyncio has a concurrent.futures equivalent, in fact exactly the
> same, which imho is a peculiarity of the python stdlib right now, aka the
> nearly identical code bases that are asyncio.futures and
> concurrent.futures).
>
> So it won't run any task more then once (unless smth very bad happens
> between task run and DB update, but noone is safe from that) and it will
> remember all errors that happened before.
> On the outer level all tasks will get stored in the context and if some
> error occurs, they will just get reverted e.g. by calling their revert()
> method in reverse loop.
>
> how do we resume tasks, revert side effects, and control flow?
>
>
> This would allow to resume tasks, revert side effects on error and provide
> way better way of describing control flow (plain Python instead new .add()
> language).
> Declaration of control flow is also just a Python method, nothing else. So
> you can import it anywhere, start the same task on one node, stop it and
> continue running it on another one.
>
> I'm not suggesting that taskflow is useless and asyncio is better (apple
> vs oranges). I'm saying that using coroutines (asyncio) can improve ways we
> can use taskflow and provide clearer method of developing these flows.
> This was mostly response to the "this is impossible with coroutines". I
> say it is possible and it can even be better.
> --
>
>
> Agreed, to me it's a comparison of an engine (asyncio) and a car
> (taskflow), the car has a fuel-tank (the storage layer), a body & wheels
> (the workflow structure), a transmission (the runner from above) and also
> an engine (asyncio or other...). So the comparison to me isn't a comparison
> about better or worse since you can't compare an engine to a car. I totally
> expect the engine/s to be able to be asyncio based (as pointed out in
> clints response); even if it's right now concurrent.futures based; I think
> we can have more of a detailed specification here on how that would work
> but hopefully the idea is more clear now.
>
> The following might help visualize this (for those that are more visual
> types): https://wiki.openstack.org/wiki/File:Core.png
>
> Anyways other peculiarities that I believe coroutines would have, other
> thoughts would be interesting to hear...
>
> 1. How would a coroutine structure handle resumption? When the call-graph
> is the structure (aka in taskflow terminology a 'pattern') and when a
> coroutine does a 'yield from' in the middle of its execution (to say start
> another section of the workflow) and then the machine dies that was running
> this work, upon restart how does the running coroutine (run_task for
> example could be the thing executing all this) get back into the middle of
> that coroutine? Since a coroutine can yield multiple exit points (that
> coroutine can also store local state in variables and other objects), how
> would the run_task get back into one of those with the prior state? This
> seems problematic if the workflow structure is the coroutine call-graph
> structure. If the solution is to have coroutines chained together and no
> multiple yield points are allowed (or only 1 yield point is allowed) then
> it seems you just recreated the existing taskflow pattern ;)
>
It's easy: run_task places runs task only if it hasn't been run. In other
case it just returns collected result.
> a) This is imho even a bigger problem when you start running those
> coroutines on remote machines, since taskflow has awareness of the declared
> workflow (not necessarily implemented by a graph, which is why the core.png
> has a 'cloud' around the compilation) it can request that remote workers[2]
> execute the components of that workflow (and receive there results and
> repeat...) in parallel. This is one thing that I don't understand how the
> call-graph as the structure would actually achieve (if the declared
> workflow is embedded in the call graph there is no visibility into it to
> actually do this). This feature is key to scale imho (and is pretty
> commonly done in other parallel frameworks, via message passing, or under
> other names, note that [3, 4] and other libraries do similar things).
>
Once again, you can schedule tasks to other workers in run_task. You don't
need to know all workflow to run one task. And all points where tasks can
be run in parallel will be explicitly marked so (see asyncio.gather in
example) and asyncio will run those run_tasks "in parallel".
> 2. Introspection, I hope this one is more obvious. When the coroutine
> call-graph is the workflow there is no easy way to examine it before it
> executes (and change parts of it for example before it executes). This is a
> nice feature imho when it's declaratively and explicitly defined, you get
> the ability to do this. This part is key to handling upgrades that
> typically happen (for example the a workflow with the 5th task was upgraded
> to a newer version, we need to stop the service, shut it off, do the code
> upgrade, restart the service and change 5th task from v1 to v1.1).
>
I don't really understand why would one want to examine or change workflow
before running. Shouldn't workflow provide just enough info about which
tasks should be run in what order?
In case with coroutines when you do your upgrade and rerun workflow, it'll
just skip all steps that has already been run and run your new version of
5th task.
3. Dataflow: tasks in taskflow can not just declare workflow dependencies
> but also dataflow dependencies (this is how tasks transfer things from one
> to another). I suppose the dataflow dependency would mirror to coroutine
> variables & arguments (except the variables/arguments would need to be
> persisted somewhere so that it can be passed back in on failure of the
> service running that coroutine). How is that possible without an
> abstraction over those variables/arguments (a coroutine can't store these
> things in local variables since those will be lost)?It would seem like this
> would need to recreate the persistence & storage layer[5] that taskflow
> already uses for this purpose to accomplish this.
>
You don't need to persist local variables. You just need to persist results
of all tasks (and you have to do it if you want to support workflow
interruption and restart). All dataflow dependencies are declared in the
coroutine in plain Python which is what developers are used to.
>
> [2] http://docs.openstack.org/developer/taskflow/workers.html
> [3] http://python-rq.org/
> [4] http://pythonhosted.org/pulsar/apps/tasks.html
> [5] http://docs.openstack.org/developer/taskflow/persistence.html
>
> Anyway it would be neat to try to make a 'coro' pattern in taskflow and
> have the existing engine from above work with those types of coroutines
> seamlessly (I expect this to actually be pretty easy) and let users who
> want to go about this path try it (as long as they understand the drawbacks
> this imho brings) then this could be an avenue to enabling both approaches
> (either way has its own set of drawbacks and advantages). That way the work
> done for coroutines is at one place and not scattered all over, this then
> gains all the projects a declarative workflow (where those projects feel
> this is applicable) that they can use to reliably (and easily scale it) to
> execute things.
>
--
Kind regards, Yuriy.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20140711/fa837f89/attachment-0001.html>
More information about the OpenStack-dev
mailing list