<font size=2 face="sans-serif">The threading in the archive includes this
discussion under the "HOT Software orchestration proposal for workflows"
heading, and the overall ordering in the archive looks very mixed up to
me. I am going to reply here, hoping that the new subject line will
be subject to less strange ordering in the archive; this is really a continuation
of the overall discussion, not just Steve Baker's proposal.</font>
<br>
<br><font size=2 face="sans-serif">What is the difference between what
today's heat engine does and a workflow? I am interested to hear
what you experts think, I hope it will be clarifying. I presume the
answers will touch on things like error handling, state tracking, and updates.</font>
<br>
<br><font size=2 face="sans-serif">I see the essence of Steve Baker's proposal
to be that of doing the minimal mods necessary to enable the heat engine
to orchestrate software components. The observation is that not much
has to change, since the heat engine is already in the business of calling
out to things and passing values around. I see a little bit of a
difference, maybe because I am too new to already know why it is not an
issue. In today's heat engine, the calls are made to fixed services
to do CRUD operations on virtual resources in the cloud, using credentials
managed implicitly; the services have fixed endpoints, even as the virtual
resources come and go. Software components have no fixed service
endpoints; the service endpoints come and go as the host Compute instances
come and go; I did not notice a story about authorization for the software
component calls.</font>
<br>
<br><font size=2 face="sans-serif">Interestingly, Steve Baker's proposal
reminds me a lot of Chef. If you just rename Steve's "component"
to "recipe", the alignment gets real obvious; I am sure it is
no accident. I am not saying it is isomorphic --- clearly Steve Baker's
proposal has more going on, with its cross-VM data dependencies and synchronization.
But let me emphasize that we can start to see a different way of
thinking here. Rather than focusing on a centrally-run workflow,
think of each VM as independently running its own series of recipes ---
with the recipes invocations now able to communicate and synchronize between
VMs as well as within VMs.</font>
<br>
<br><font size=2 face="sans-serif">Steve Baker's proposal uses two forms
of communication and synchronization between VMs: (1) get_attr and (2)
wait conditions and handles (sugar coated or not). The implementation
of (1) is part of the way the heat engine invokes components, the implementation
of (2) is independent of the heat engine.</font>
<br>
<br><font size=2 face="sans-serif">Using the heat engine for orchestration
is limited to the kinds of logic that the heat engine can run. This
may be one reason people are suggesting using a general workflow engine.
However, the recipes (components) running in the VMs can do general
computation; if we allow general cross-VM communication and synchronization
as part of those general computations, we clearly have a more expressive
system than the heat engine.</font>
<br>
<br><font size=2 face="sans-serif">Of course, a general distributed computation
can get itself into trouble (e.g., deadlock, livelock). If we structure
that computation as a set of components (recipe invocations) with a DAG
of dependencies then we avoid those troubles. And the kind of orchestration
that the heat engine does is sufficient to invoke such components.</font>
<br>
<br><font size=2 face="sans-serif">Structuring software orchestration as
a DAG of components also gives us a leg up on UPDATE. Rather than
asking the user to write a workflow for each different update, or a general
meta-workflow that does introspection to decide what work needs to be done,
we ask the thing that invokes the components to run through the components
in the way that today's heat engine runs through resources for an UPDATE.</font>
<br>
<br><font size=2 face="sans-serif">Lakshmi has been working on a software
orchestration technique that is also centered on the idea of a DAG of components.
It was created before we got real interested in Heat. It is
implemented as a pre-processor that runs upstream of where today's heat
engine goes, emitting fairly minimal userdata needed for bootstrapping.
The dependencies between recipe invocations are handled very smoothly
in the recipes, which are written in Chef. No hackery is needed in
the recipe text at all (thanks to Ruby metaprogramming); what is needed
is only an additional declaration of what are the cross-VM inputs and outputs
of each recipe. The propagation of data and synchronization between
VMs is handled, under the covers, via simple usage of ZooKeeper (other
implementations are reasonable too). But the idea of heat-independent
propagation of data and synchronization among a DAG of components is not
limited to chef-based components, and can appear fairly smooth in any recipe
language.</font>
<br>
<br><font size=2 face="sans-serif">A value of making software orchestration
independent of today's heat engine is that it enables the four-stage pipeline
that I have sketched at </font><a href=https://docs.google.com/drawings/d/1Y_yyIpql5_cdC8116XrBHzn6GfP_g0NHTTG_W4o0R9U><font size=2 face="sans-serif">https://docs.google.com/drawings/d/1Y_yyIpql5_cdC8116XrBHzn6GfP_g0NHTTG_W4o0R9U</font></a><font size=2 face="sans-serif">
and whose ordering of functionality has been experimentally vetted with
some non-trivial examples. The first big one we did, setting up an
IBM collaboration product called "Connections", is even more
complicated than the SQL server example with which this thread started.
The pipeline starts with modular input (a Ruby program, in fact);
running that program produces a monolithic integrated (i.e., discussing
both software and infrastructure) model. In our running code today
that model is in memory in a process that proceeds with the next stage;
in the future those could be separate processes connected by an extended
heat template. The second stage is the pre-processor I mentioned,
which takes the monolithic integrated model and: (1) checks it for various
things, such as the fact that the component dependencies really do form
a DAG; (2) prepares the zNodes in ZooKeeper that will be used by the components
to exchange data and synchronization; and (3) emits a stripped down template
that (a) requires no particular understanding of software orchestration
from the downstream functions and (b) has no "user-level" dependencies
among the resources. The next stage does holistic infrastructure
scheduling, which I think is quite valuable and interesting in its own
right. Downstream from that is infrastructure orchestration. Once
VMs start running their userdata they bootstrap the recipe invocations
that do the actual software orchestration (including cross-VM stuff as
discussed). </font>
<br>
<br><font size=2 face="sans-serif">OTOH, we could swap the order of the
two middle stages in the pipeline to get an alternate pipeline that I sketched
at </font><a href="https://docs.google.com/drawings/d/1TCfNwzH_NBnx3bNz-GQQ1bRVgBpJdstpu0lH_TONw6g"><font size=2 face="sans-serif">https://docs.google.com/drawings/d/1TCfNwzH_NBnx3bNz-GQQ1bRVgBpJdstpu0lH_TONw6g</font></a><font size=2 face="sans-serif">
. I favor the order we have running today because I think it does
a better job of grouping related things together: the first two stages
concern software, and the last two stages concern infrastructure.</font>
<br>
<br><font size=2 face="sans-serif">Regards,</font>
<br><font size=2 face="sans-serif">Mike</font>