[openstack-dev] [Heat] Future Vision for Heat

Joshua Harlow harlowja at yahoo-inc.com
Wed Apr 24 20:01:30 UTC 2013


I like your idea that everything should be idempotent, although I think
that¹s a little far off (and requires downstream libraries to be
idempotent as well, aka libvirt...)

I think the better approach right now is that we have a task history and
'engines' can skip tasks they have already completed. This type of task
history is needed anyway to accomplish rollbacks (since how do you know
what to rollback if you don't keep a history of things that the engine
'accomplished' in the first place). Along with the task history (something
like a list of tasks completed, aka [started, deployed X, started X,
published ip Y, done]) you likely also need to persist enough information
to rollback each of those tasks previously completed (likely you will need
some type of metadata about each task to describe in detail what occurred)
since knowing that something was 'started' is not enough to know how to
stop it (you need things like what was started, who was it started as,
what service started it...)

So that¹s my idea for how resumption/task history can be done, such data
can be stored in a DB or elsewhere since its more of 'metadata' associated
with a workflow.

-- 

As for the HA/ownership transfer/locking aspect, this is where it gets
very interesting IMHO.

Sorta baseline requirements for what I think is needed for these engines:

1. Resumption + task/state history storage (see above ideas)
2. A workflow must be owned by only one engine for the duration of said
workflow (unless said engine fails) - aka locking
3. Ability to transfer workflow ownership from one engine to another (and
notify others) on failure of said engine - aka liveness & workflow
transfer associated with the liveness
(4...) - see https://etherpad.openstack.org/task-system


Some of the issues I see with the database approach/idea which something
like zookeeper solves for us (in a very easy to use manner).

Afaik, a database can't do #2/#3 (especially if you have a mysql cluster,
or mysql replication, or are doing any sort of mysql sharding).

Zookeeper solves all of these, since that¹s its main purpose - very good
to read and re-read http://research.yahoo.com/pub/3280

Lets first start off with some code and this will help the discussion
process (it could be the basis for the work to be done). I have put some
up @ https://github.com/harlowja/zkplayground, try the example there and
ctrl-c the consumers or ctrl-c the producers and see what happens... You
should see the work being submitted, and being consumed, and when u ctrl-c
you will see another consumer (consumer==engine) attempt resumption and so
on.

#1 is handled by storing state history as node in zookeeper, since
zookeeper is easily made HA
(http://zookeeper.apache.org/doc/r3.1.2/zookeeperStarted.html#sc_RunningRep
licatedZooKeeper) and zookeeper is battle-hardnened already. Note when
setup correctly this allows zookeeper and the associated 'engine' code to
easily do resumption by having an engine look at what the current history
and skip tasks it has already done.

#2 is handled by zookeeper doing distributed locking in a reliable & HA
manner, thus no other engine can own a workflow when another engine is
working on said workflow.

#3 Zookeeper maintains liveness checks with clients that are attached to
zookeeper so this allows zookeeper to know what clients are active and
what locks those clients currently have. Using this information, zookeeper
can be setup (a typical recipe @
http://zookeeper.apache.org/doc/trunk/recipes.html#sc_recipes_Queues and
also documented at 
https://github.com/python-zk/kazoo/blob/master/kazoo/recipe/queue.py#L107)
as a queue of work that producers can put tasks into and workers (aka,
consumers, engines) can work on. Now this may seem pretty normal, but here
is where the interesting stuff comes to play, when a worker picks an item
off the queue, they are granted the locks associated with that data from
the queue item (so this allows them to have the single workflow ownership
property, aka no other worker can get that item, this is guaranteed by
zookeeper), but also it means that if said worker dies halfway through
processing that workflow the locks associated with that item get released
and the item gets put back on the queue (this gives us the awesome
property that we can transfer workflow ownership automatically), of course
if a worker completes said task they consume the entry (which removes it
from the queue). This is your sophisticated concept of 'event triggering'
causing ownership to transfer from one worker to another, and zookeeper
gives it for free :-) It also means that we can horizontally scale the
number of producers (think of this as the nova-api putting work items into
the queue, or heat-api doing something similar) and horizontally scale the
number of consumers/engines/workers picking items off that work queue.

Note that in the zkplayground code above the 'work history' is also stored
in zookeeper as persistent nodes, although this does not need to be the
case (since work/task history metadata can be stored in a DB if we want as
described above...)

I'd like to here your input on the above, since I (and others) believe it
is the way forward to the best orchestration we can have with the current
technology.
 
On 4/24/13 9:03 AM, "Zane Bitter" <zbitter at redhat.com> wrote:

>On 18/04/13 07:37, Joshua Harlow wrote:
>> I'd be very interested in having 'Scheduler & Workflow Service' part
>>there
>> be a library.
>>
>> Pretty much every application is a workflow in some way, and using said
>> library in nova for the orc work there would be very neat.
>
>+1
>>
>> As long as it doesn't change the use-case that heat desires of course
>>(or
>> overload that use-case and make everything complex when it doesn't need
>>to
>> be)...
>
>That shouldn't be a big issue. Heat's use case and Nova's are similar,
>except that we also need to run tasks in parallel. However, I'm in the
>process of delivering that feature in Heat now and the 'library' part of
>that code is very localised, so I don't think it would add much
>complexity for anyone not using it.
>>
>> It would be very neat to use those 2 services as a library, where I can
>> submit arbitrary code to (the state transitions that nova does) and have
>> it handle calling those states, coordinating, rolling back (or at least
>> calling a method to rollback, since rollback is usually very specific to
>> what has occurred). Basically submitting jobs, but not via a DSL/CEP (or
>> the like), not via a model interpreter, but directly via code.
>
>Yes, I very much like the idea of just having a library that takes
>Python code and runs it as a workflow, rather than having some sort of
>DSL from which you have to try to generate/locate the code each time.
>(That stuff would be the job of the proposed Convection service, which
>ideally could also be built on top of this library.)
>
>> Is there
>> any documentation on how heat handles task resumption (if 1 engine
>>fails,
>> another should be able to continue the work), how are said engines made
>>HA
>> and reliable...
>
>This is an issue we haven't yet tackled in Heat - there was a Design
>Summit session about it, and it is on the agenda for Havana.
>
>The easiest way forward, for both Heat and Nova, might be to mandate
>that the tasks are written to be idempotent and rely on their existing
>use of state stored in the database. Then we would need only one level
>of locking (to determine whether the workflow is still being run, or has
>died and we need to resume it or roll it back).
>
>The more sophisticated approach is obviously to have it store the
>current progress in the database and have some sort of event triggered
>when a task completes that lets any other workflow engine process pick
>it up and start the next step. I'm not convinced that's feasible for a
>simple library though; a service with a DSL could do it, but I'd rather
>not go down that path.
>
>cheers,
>Zane.
>
>>
>> Such a engine library would make sense as the core 'engine' for a lot of
>> the openstack core projects imho.
>>
>> Thoughts?
>




More information about the OpenStack-dev mailing list