[openstack-dev] [Heat][Summit] Input wanted - real world heat spec
Qiming Teng
tengqim at linux.vnet.ibm.com
Fri Apr 25 02:31:15 UTC 2014
> > Specifically, I am not clear on whether 'convergence' means:
> > (a) Heat continues to respect the dependency graph but does not stop
> > after one traversal, instead repeatedly processing it until (and even
> > after) the stack is complete; or
> > (b) Heat ignores the dependency graph and just throws everything
> > against the wall, repeating until it has all stuck.
> >
>
> I think (c). We still have the graph driving "what to do next" so that
> the things are more likely to stick. Also we don't want to do 10,000
> instance creations if the database they need isn't going to come
> available.
>
> But we decouple "I need to do something" from "The user asked for
> something" by allowing the convergence engine to act on notifications
> from the observer engine. In addition to allowing more automated actions,
> it should allow us to use finer grained locking because no individual
> task will need to depend on the whole graph or stack. If an operator
> comes along and changes templates or parameters, we can still complete
> our outdated action. Eventually convergence will arrive at a state which
> matches the desired stack.
There could be live/dead locks if the granularity becomes smaller. Need
some ruling design to avoid it before we find it too difficult to debug.
> > I also have doubts about the principle "Users should only need to
> > intervene with a stack when there is no right action that Heat can take
> > to deliver the current template+parameters". That sounds good in theory,
> > but in practice it's very hard to know when there is a right action Heat
> > can take and when there isn't. e.g. There are innumerable ways to create
> > a template that can _never_ actually converge, and I don't believe
> > there's a general way we can detect that, only the hard way: one error
> > type at a time, for every single resource type. Offering users a way to
> > control how and when that happens allows them to make the best decisions
> > for their particular circumstances - and hopefully a future WFaaS like
> > Mistral will make it easy to set up continuous monitoring for those who
> > require it. (Not incidentally, it also gives cloud operators an
> > opportunity to charge their users in proportion to their actual
> > requirements.)
> >
>
> There are some obvious times where there _is_ a clear automated answer
> that does not require me to defer to a user's special workflow. 503 or
> 429 (I know, not ratified yet) status codes mean I should retry after
> maybe backing off a bit. If I get an ERROR state on a nova VM, I should
> retry a few times before giving up.
+1 on this.
> The point isn't that we have all the answers, it is that there are
> plenty of places where where we do have good answers that will serve
> most users well.
Right. I would expect all resources in Heat to be wrapped (encapsulated)
very well that they know how to handle most events. Well, in some
cases, additional hints are expected/needed from the events. If a
resource doesn't know how to respond to an event, we provide a default
(well-defined) propagation path for the message. Assuming this can be
done, we only have to deal with some macro-level complexities where an
external workflow is needed.
> This obsoletes that. We don't need to keep track if we adopt a convergence
> model. The template that the user has asked for, is the template we
> converge on. The diff between that and reality dictates the changes we
> need to make. Wherever we're at with the convergence step that was last
> triggered can just be cancelled by the new one.
Seems that we need a protocol for cancelling an operation then ...
More information about the OpenStack-dev
mailing list