[openstack-dev] [Heat][Summit] Input wanted - real world heat spec

Qiming Teng tengqim at linux.vnet.ibm.com
Fri Apr 25 02:31:15 UTC 2014


> > Specifically, I am not clear on whether 'convergence' means:
> >   (a) Heat continues to respect the dependency graph but does not stop 
> > after one traversal, instead repeatedly processing it until (and even 
> > after) the stack is complete; or
> >   (b) Heat ignores the dependency graph and just throws everything 
> > against the wall, repeating until it has all stuck.
> > 
> 
> I think (c). We still have the graph driving "what to do next" so that
> the things are more likely to stick. Also we don't want to do 10,000
> instance creations if the database they need isn't going to come
> available.
> 
> But we decouple "I need to do something" from "The user asked for
> something" by allowing the convergence engine to act on notifications
> from the observer engine. In addition to allowing more automated actions,
> it should allow us to use finer grained locking because no individual
> task will need to depend on the whole graph or stack. If an operator
> comes along and changes templates or parameters, we can still complete
> our outdated action. Eventually convergence will arrive at a state which
> matches the desired stack.

There could be live/dead locks if the granularity becomes smaller. Need
some ruling design to avoid it before we find it too difficult to debug.

> > I also have doubts about the principle "Users should only need to 
> > intervene with a stack when there is no right action that Heat can take 
> > to deliver the current template+parameters". That sounds good in theory, 
> > but in practice it's very hard to know when there is a right action Heat 
> > can take and when there isn't. e.g. There are innumerable ways to create 
> > a template that can _never_ actually converge, and I don't believe 
> > there's a general way we can detect that, only the hard way: one error 
> > type at a time, for every single resource type. Offering users a way to 
> > control how and when that happens allows them to make the best decisions 
> > for their particular circumstances - and hopefully a future WFaaS like 
> > Mistral will make it easy to set up continuous monitoring for those who 
> > require it. (Not incidentally, it also gives cloud operators an 
> > opportunity to charge their users in proportion to their actual 
> > requirements.)
> > 
> 
> There are some obvious times where there _is_ a clear automated answer
> that does not require me to defer to a user's special workflow. 503 or
> 429 (I know, not ratified yet) status codes mean I should retry after
> maybe backing off a bit. If I get an ERROR state on a nova VM, I should
> retry a few times before giving up.

+1 on this.

> The point isn't that we have all the answers, it is that there are
> plenty of places where where we do have good answers that will serve
> most users well.

Right. I would expect all resources in Heat to be wrapped (encapsulated)
very well that they know how to handle most events.  Well, in some
cases, additional hints are expected/needed from the events.  If a
resource doesn't know how to respond to an event, we provide a default
(well-defined) propagation path for the message.  Assuming this can be
done, we only have to deal with some macro-level complexities where an
external workflow is needed.
 
> This obsoletes that. We don't need to keep track if we adopt a convergence
> model. The template that the user has asked for, is the template we
> converge on. The diff between that and reality dictates the changes we
> need to make. Wherever we're at with the convergence step that was last
> triggered can just be cancelled by the new one.

Seems that we need a protocol for cancelling an operation then ...




More information about the OpenStack-dev mailing list