<font size=2 face="sans-serif">The threading in the archive includes this

discussion under the "HOT Software orchestration proposal for workflows"

heading, and the overall ordering in the archive looks very mixed up to

me.  I am going to reply here, hoping that the new subject line will

be subject to less strange ordering in the archive; this is really a continuation

of the overall discussion, not just Steve Baker's proposal.</font>

<br>

<br><font size=2 face="sans-serif">What is the difference between what

today's heat engine does and a workflow?  I am interested to hear

what you experts think, I hope it will be clarifying.  I presume the

answers will touch on things like error handling, state tracking, and updates.</font>

<br>

<br><font size=2 face="sans-serif">I see the essence of Steve Baker's proposal

to be that of doing the minimal mods necessary to enable the heat engine

to orchestrate software components.  The observation is that not much

has to change, since the heat engine is already in the business of calling

out to things and passing values around.  I see a little bit of a

difference, maybe because I am too new to already know why it is not an

issue.  In today's heat engine, the calls are made to fixed services

to do CRUD operations on virtual resources in the cloud, using credentials

managed implicitly; the services have fixed endpoints, even as the virtual

resources come and go.  Software components have no fixed service

endpoints; the service endpoints come and go as the host Compute instances

come and go; I did not notice a story about authorization for the software

component calls.</font>

<br>

<br><font size=2 face="sans-serif">Interestingly, Steve Baker's proposal

reminds me a lot of Chef.  If you just rename Steve's "component"

to "recipe", the alignment gets real obvious; I am sure it is

no accident.  I am not saying it is isomorphic --- clearly Steve Baker's

proposal has more going on, with its cross-VM data dependencies and synchronization.

 But let me emphasize that we can start to see a different way of

thinking here.  Rather than focusing on a centrally-run workflow,

think of each VM as independently running its own series of recipes ---

with the recipes invocations now able to communicate and synchronize between

VMs as well as within VMs.</font>

<br>

<br><font size=2 face="sans-serif">Steve Baker's proposal uses two forms

of communication and synchronization between VMs: (1) get_attr and (2)

wait conditions and handles (sugar coated or not).  The implementation

of (1) is part of the way the heat engine invokes components, the implementation

of (2) is independent of the heat engine.</font>

<br>

<br><font size=2 face="sans-serif">Using the heat engine for orchestration

is limited to the kinds of logic that the heat engine can run.  This

may be one reason people are suggesting using a general workflow engine.

 However, the recipes (components) running in the VMs can do general

computation; if we allow general cross-VM communication and synchronization

as part of those general computations, we clearly have a more expressive

system than the heat engine.</font>

<br>

<br><font size=2 face="sans-serif">Of course, a general distributed computation

can get itself into trouble (e.g., deadlock, livelock).  If we structure

that computation as a set of components (recipe invocations) with a DAG

of dependencies then we avoid those troubles.  And the kind of orchestration

that the heat engine does is sufficient to invoke such components.</font>

<br>

<br><font size=2 face="sans-serif">Structuring software orchestration as

a DAG of components also gives us a leg up on UPDATE.  Rather than

asking the user to write a workflow for each different update, or a general

meta-workflow that does introspection to decide what work needs to be done,

we ask the thing that invokes the components to run through the components

in the way that today's heat engine runs through resources for an UPDATE.</font>

<br>

<br><font size=2 face="sans-serif">Lakshmi has been working on a software

orchestration technique that is also centered on the idea of a DAG of components.

 It was created before we got real interested in Heat.  It is

implemented as a pre-processor that runs upstream of where today's heat

engine goes, emitting fairly minimal userdata needed for bootstrapping.

 The dependencies between recipe invocations are handled very smoothly

in the recipes, which are written in Chef.  No hackery is needed in

the recipe text at all (thanks to Ruby metaprogramming); what is needed

is only an additional declaration of what are the cross-VM inputs and outputs

of each recipe.  The propagation of data and synchronization between

VMs is handled, under the covers, via simple usage of ZooKeeper (other

implementations are reasonable too).  But the idea of heat-independent

propagation of data and synchronization among a DAG of components is not

limited to chef-based components, and can appear fairly smooth in any recipe

language.</font>

<br>

<br><font size=2 face="sans-serif">A value of making software orchestration

independent of today's heat engine is that it enables the four-stage pipeline

that I have sketched at </font><a href=https://docs.google.com/drawings/d/1Y_yyIpql5_cdC8116XrBHzn6GfP_g0NHTTG_W4o0R9U><font size=2 face="sans-serif">https://docs.google.com/drawings/d/1Y_yyIpql5_cdC8116XrBHzn6GfP_g0NHTTG_W4o0R9U</font></a><font size=2 face="sans-serif">

and whose ordering of functionality has been experimentally vetted with

some non-trivial examples.  The first big one we did, setting up an

IBM collaboration product called "Connections", is even more

complicated than the SQL server example with which this thread started.

 The pipeline starts with modular input (a Ruby program, in fact);

running that program produces a monolithic integrated (i.e., discussing

both software and infrastructure) model.  In our running code today

that model is in memory in a process that proceeds with the next stage;

in the future those could be separate processes connected by an extended

heat template.  The second stage is the pre-processor I mentioned,

which takes the monolithic integrated model and: (1) checks it for various

things, such as the fact that the component dependencies really do form

a DAG; (2) prepares the zNodes in ZooKeeper that will be used by the components

to exchange data and synchronization; and (3) emits a stripped down template

that (a) requires no particular understanding of software orchestration

from the downstream functions and (b) has no "user-level" dependencies

among the resources.  The next stage does holistic infrastructure

scheduling, which I think is quite valuable and interesting in its own

right.  Downstream from that is infrastructure orchestration.  Once

VMs start running their userdata they bootstrap the recipe invocations

that do the actual software orchestration (including cross-VM stuff as

discussed).  </font>

<br>

<br><font size=2 face="sans-serif">OTOH, we could swap the order of the

two middle stages in the pipeline to get an alternate pipeline that I sketched

at </font><a href="https://docs.google.com/drawings/d/1TCfNwzH_NBnx3bNz-GQQ1bRVgBpJdstpu0lH_TONw6g"><font size=2 face="sans-serif">https://docs.google.com/drawings/d/1TCfNwzH_NBnx3bNz-GQQ1bRVgBpJdstpu0lH_TONw6g</font></a><font size=2 face="sans-serif">

.  I favor the order we have running today because I think it does

a better job of grouping related things together: the first two stages

concern software, and the last two stages concern infrastructure.</font>

<br>

<br><font size=2 face="sans-serif">Regards,</font>

<br><font size=2 face="sans-serif">Mike</font>