Open Stack

Thu Apr 24 21:23:38 UTC 2014

On 23/04/14 20:45, Robert Collins wrote:
> Hi, we've got this summit session planned -
> http://summit.openstack.org/cfp/details/428 which is really about
> https://etherpad.openstack.org/p/heat-workflow-vs-convergence
>
> We'd love feedback and questions - this is a significant amount of
> work, but work I (and many others based on responses so far) believe
> it is needed to really take Heat to users and ops teams.
>
> Right now we're looking for both high and low level design and input.
>
> If you're an operator/user/developer of/with/around heat - please take
> a couple of minutes to look - feedback inline in the etherpad, or here
> on the list - whatever suits you.
>
> The basic idea is:
>   - no changes needed to the heat template language etc

+1 for this part, definitely :)

>   - take a holistic view and fix the system's emergent properties by
> using a different baseline architecture within it
>   - ???
>   - profit!

Thanks for writing this up Rob. This is certainly a more ambitious scale 
of application to deploy than we ever envisioned in the early days of 
Heat ;) But I firmly believe that what is good for TripleO will be great 
for the rest of our users too. All of the observed issues mentioned are 
things we definitely want to address.

I have a few questions about the specific architecture being proposed. 
It's not clear to me what you mean by "call-stack style" in referring to 
the current paradigm. Maybe you could elaborate on how the current style 
and the "convergence style" differ.

Specifically, I am not clear on whether 'convergence' means:
  (a) Heat continues to respect the dependency graph but does not stop 
after one traversal, instead repeatedly processing it until (and even 
after) the stack is complete; or
  (b) Heat ignores the dependency graph and just throws everything 
against the wall, repeating until it has all stuck.

I also have doubts about the principle "Users should only need to 
intervene with a stack when there is no right action that Heat can take 
to deliver the current template+parameters". That sounds good in theory, 
but in practice it's very hard to know when there is a right action Heat 
can take and when there isn't. e.g. There are innumerable ways to create 
a template that can _never_ actually converge, and I don't believe 
there's a general way we can detect that, only the hard way: one error 
type at a time, for every single resource type. Offering users a way to 
control how and when that happens allows them to make the best decisions 
for their particular circumstances - and hopefully a future WFaaS like 
Mistral will make it easy to set up continuous monitoring for those who 
require it. (Not incidentally, it also gives cloud operators an 
opportunity to charge their users in proportion to their actual 
requirements.)

> This can be constrasted with many other existing attempts to design
> solutions which relied on keeping the basic internals of heat as-is
> and just tweaking things - an approach we don't believe will work -
> the issues arise from the current architecture, not the quality of the
> code (which is fine).

Some of the ideas that have been proposed in the past:

- Moving execution of operations on individual resources to a 
distributed execution system using taskflow. (This should address the 
scalability issue.)
- Updating the stored template in real time during stack updates - this 
is happening in Juno btw. (This will solve the problem of inability to 
ever recover from an update failure. In theory, it would also make it 
possible to interrupt a running update and make changes.)
- Implementing a 'stack converge' operation that the user can trigger to 
compare the actual state of the stack with the model and bring it back 
into spec.

It would be interesting to see some analysis on exactly how these 
existing attempts fall down in trying to fulfil the goals, as well as 
the specific points at which the proposed implementation differs.

Depending on the answers to the above questions, this proposal could be 
anything between a modest reworking of those existing ideas and a 
complete re-imagining of the entire concept of Heat. I'd very much like 
to find out where along that spectrum it lies :)

BTW, it appears that the schedule you're suggesting involves assigning a 
bunch of people unfamiliar with the current code base and having them 
complete a ground-up rearchitecting of the whole engine, all within the 
Juno development cycle (about 3.5 months). This is simply not consistent 
with reality as I have observed it up to this point.

cheers,
Zane.

Open Stack

[openstack-dev] [Heat][Summit] Input wanted - real world heat spec

OpenStack

Community

Documentation

Branding & Legal