[openstack-dev] [heat][tripleo] User Initiated Rollback
zbitter at redhat.com
Wed Dec 2 16:51:55 UTC 2015
On 02/12/15 11:02, Steven Hardy wrote:
> So, chatting with Giulio today about https://bugs.launchpad.net/heat/+bug/1521944
> has be thinking about $subject.
> The root case of that issue is essentially a corner case of a stack-update,
> combined with some coupling within the Neutron API which prevents the
> update traversal from working.
> But it raises the broader question of what a "rollback" actually is, and
> how a user can potentially use it to get out of the kind of mess described
> in that bug (where, otherwise, your only option is to delete the entire
I'm not sure it does raise that question; the same issue crops up
whether you try to roll back or roll forward.
> Currently, we treat rollback as a special type of update, where, if an
> in-progress update fails, we then try to update again, to the previous
> stack definition, but as Giulio has discovered, there are times when
> that doesn't work, because what you actually want is to recover the
> existing resource from the backup stack, not create a new one with the same
The rollback flow isn't the problem here. The problem is that the
resource is marked as DELETE_FAILED, and Heat has no mechanism in
general for knowing if that means it's still good and we can restore it
or if it is, as we say in New Zealand, completely munted.
Since Heat can't know, it assumes the latter and replaces the resource.
If we wanted to fix this, we'd need a mechanism to verify the health of
the resource - and obviously it would have to be resource-specific. We
already have an interface for that kind of mechanism in the form of
handle_check(), so there's a chance we could repurpose that to do this.
> Then, looking at convergence, we have a different definition of rollback,
> it's not yet clear to me how this should behave in a similar scenario, e.g
> when the resource we want to roll back to failed to get deleted but still
> exists (so, the resource is FAILED, but the underlying resource is fine)?
It's essentially the same. Convergence behaves a bit better when
multiple failed versions of the same resource start stacking up, but it
won't solve the problem.
> Finally, the interface to rollback - atm you have to know before something
> fails that you'd like to enable rollback for a specific update. This seems
> suboptimal, since invariably by the time you know you need rollback, it's
> too late. Can we enable a user-initiated rollback from a FAILED state, via
> one of:
> - Introduce a new heat API that allows an explicit heat stack-rollback?
> - (ab)use PATCH to trigger rollback on heat stack-update -x --rollback=True?
In convergence there's no distinction between a rollback and an update
using the previous template, so IMHO there's not much need for a
> The former approach fits better with the current stack.Stack
> implementation, because the ROLLBACK stack state already exists. The
> latter has the advantage that it doesn't need a new API so might be
Convergence does store a copy of the previous template (not 100% sure
when it deletes it at the moment - I suspect after the update succeeds),
so a rollback API would be feasible if we decided we needed it. I'd
prefer the first approach if so.
> Any thoughts on how we might proceed to make this situation better, and
> enable folks to roll back in the least destructive way possible when they
> end up in a FAILED state?
Note that the root cause of this problem is that Heat doesn't have a
global view of dependencies across stacks - if it did it would never
have tried to delete the subnet with ports still in it. For the benefit
of those who weren't at the design summit, we discussed potential fixes
>  https://github.com/openstack/heat/blob/master/heat/engine/stack.py#L1331
>  https://github.com/openstack/heat/blob/master/heat/engine/stack.py#L1143
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
More information about the OpenStack-dev