[openstack-dev] [heat][tripleo] User Initiated Rollback

Steven Hardy shardy at redhat.com
Wed Dec 2 16:02:06 UTC 2015


So, chatting with Giulio today about https://bugs.launchpad.net/heat/+bug/1521944
has be thinking about $subject.

The root case of that issue is essentially a corner case of a stack-update,
combined with some coupling within the Neutron API which prevents the
update traversal from working.

But it raises the broader question of what a "rollback" actually is, and
how a user can potentially use it to get out of the kind of mess described
in that bug (where, otherwise, your only option is to delete the entire
stack).

Currently, we treat rollback as a special type of update, where, if an
in-progress update fails, we then try to update again, to the previous
stack definition[1], but as Giulio has discovered, there are times when
that doesn't work, because what you actually want is to recover the
existing resource from the backup stack, not create a new one with the same
properties.

Then, looking at convergence, we have a different definition of rollback,
it's not yet clear to me how this should behave in a similar scenario, e.g
when the resource we want to roll back to failed to get deleted but still
exists (so, the resource is FAILED, but the underlying resource is fine)?

Finally, the interface to rollback - atm you have to know before something
fails that you'd like to enable rollback for a specific update.  This seems
suboptimal, since invariably by the time you know you need rollback, it's
too late.  Can we enable a user-initiated rollback from a FAILED state, via
one of:

 - Introduce a new heat API that allows an explicit heat stack-rollback?
 - (ab)use PATCH to trigger rollback on heat stack-update -x --rollback=True?

The former approach fits better with the current stack.Stack
implementation, because the ROLLBACK stack state already exists.  The
latter has the advantage that it doesn't need a new API so might be
backportable.

Any thoughts on how we might proceed to make this situation better, and
enable folks to roll back in the least destructive way possible when they
end up in a FAILED state?

Steve

[1] https://github.com/openstack/heat/blob/master/heat/engine/stack.py#L1331
[2] https://github.com/openstack/heat/blob/master/heat/engine/stack.py#L1143



More information about the OpenStack-dev mailing list