[openstack-dev] [Heat] Stack breakpoint

Zane Bitter zbitter at redhat.com
Mon Mar 17 23:03:25 UTC 2014

On 17/03/14 17:03, Ton Ngo wrote:
> (reposting as new thread)
> I would like to revisit with more details an idea that was mentioned in the
> last design summit and hopefully get some feedback.
> The scenario is troubleshooting a failed template.
> Currently we can stop on the point of failure by disabling rollback:  this
> works well for stack-create; stack-update requires some more work but
> that's different thread.  In many cases however, the point of failure may
> be too late or too hard to debug because the condition causing the failure
> may not be obvious or may have been changed.  If we can pause the stack at
> a point before the failure, then we can check whether the state of the
> environment and the stack is what we expect.
> The analogy with program debugging is breakpoint/step, so it may be useful
> to introduce this same concept in a stack.
> The usage would be something like:
> -Run stack-create (or stack-update once it can handle failure) with one or
> more resource name specified as breakpoint
> -As the engine traverses down the dependency graph, it would stop at the
> breakpoint resource and all dependent resources.  Other resources with no
> dependency will proceed to completion.
> -After debugging, continue the stack by:
>      -Stepping: remove current breakpoint, set breakpoint for next resource
> (s) in dependency graph, resume stack-create (or stack-update)
>      -Running to completion: remove current breakpoint, resume stack-create
> (or stack-update)
> Some other possible uses for this breakpoint:
> - While developing new template or resource type, bring up a stack to a
> point before the new code is to be executed
> - Introduce human process: pause the partial stack so the user can get the
> stack info and perform some tasks before continuing

I would like to see this solved with some sort of notify/callback 
mechanism. There are a bunch of use cases which IMHO could all be solved 
with a single feature:

- Debugging template operations by pausing and stepping to allow a user 
to debug
- Adding a manual task into the stack creation process
- Automatically augmenting the stack creation process by inserting tasks 
into the workflow
- Providing a hook to the Autoscaling engine (when it is separated out 
into a separate process) to allow it to update the load-balancer (or, 
more generically, any shared resource) at the appropriate times
- Providing a hook for e.g. Trove to confirm resizes of Nova servers.

(I think I counted 6 use cases in a previous thread, and iirc that 
didn't include the debugging one.)

A feature that:
1) Optionally notifies the user before or after performing some 
operation on a resource, and
2) After sending such a notification, waits for confirmation before 
should be able to solve all of the use cases above. (This is my 
favourite kind of feature ;)

The details of how to implement that (in particular, what channel do you 
send notifications through?) are more tricky to figure out. There was 
some discussion already in the "Rolling Updates spec re-written. RFC" 
thread. Start here and keep going:


> Some issues to consider (with some initial feedback from shardy):
> - Granularity of stepping:  resource level or internal steps within a
> resource

Before and after a resource is processed, and at any logical steps 
during, such as the CONFIRM step when resizing a Nova server.

> - How to specify breakpoints:  CLI argument or coded in template or both

I think I'd vote for both in the template and the environment.

> - How to handle resources with timer, e.g. wait condition:  pause/resume
> timer value

Handle it by only allowing pauses before and after. In most cases I'm 
not sure what it would mean to pause _during_.

> - New state for a resource:  PAUSED

It's the workflow that's paused, not the resource, so I don't see the 
need for a new state.


More information about the OpenStack-dev mailing list