[openstack-dev] [heat][tripleo] User Initiated Rollback

Clint Byrum clint at fewbar.com
Fri Dec 4 00:01:11 UTC 2015

Zane I want to echo your sentiments exactly below. I agree with all of
the things basically.

The only thing I'd add is that no matter how good you make Heat's rollback
API, it will never be as good as git. So I would suggest that you just
have people roll forward from Heat's perspective, and let VCS systems
handle history tracking. What might help with that would be maybe some
key/value pairs that would allow people to set something like this:

remote=https://github.com/myorg/mytemplates gitref=tags/1.2.9

So that anything interrogating Heat can know whether the latest reference
is applied.

Excerpts from Zane Bitter's message of 2015-12-02 08:51:55 -0800:
> On 02/12/15 11:02, Steven Hardy wrote:
> > So, chatting with Giulio today about https://bugs.launchpad.net/heat/+bug/1521944
> > has be thinking about $subject.
> >
> > The root case of that issue is essentially a corner case of a stack-update,
> > combined with some coupling within the Neutron API which prevents the
> > update traversal from working.
> >
> > But it raises the broader question of what a "rollback" actually is, and
> > how a user can potentially use it to get out of the kind of mess described
> > in that bug (where, otherwise, your only option is to delete the entire
> > stack).
> I'm not sure it does raise that question; the same issue crops up 
> whether you try to roll back or roll forward.
> > Currently, we treat rollback as a special type of update, where, if an
> > in-progress update fails, we then try to update again, to the previous
> > stack definition[1], but as Giulio has discovered, there are times when
> > that doesn't work, because what you actually want is to recover the
> > existing resource from the backup stack, not create a new one with the same
> > properties.
> The rollback flow isn't the problem here. The problem is that the 
> resource is marked as DELETE_FAILED, and Heat has no mechanism in 
> general for knowing if that means it's still good and we can restore it 
> or if it is, as we say in New Zealand, completely munted[1].
> Since Heat can't know, it assumes the latter and replaces the resource. 
> If we wanted to fix this, we'd need a mechanism to verify the health of 
> the resource - and obviously it would have to be resource-specific. We 
> already have an interface for that kind of mechanism in the form of 
> handle_check(), so there's a chance we could repurpose that to do this.
> [1] http://dictionary.reference.com/browse/munted?s=t
> > Then, looking at convergence, we have a different definition of rollback,
> > it's not yet clear to me how this should behave in a similar scenario, e.g
> > when the resource we want to roll back to failed to get deleted but still
> > exists (so, the resource is FAILED, but the underlying resource is fine)?
> It's essentially the same. Convergence behaves a bit better when 
> multiple failed versions of the same resource start stacking up, but it 
> won't solve the problem.
> > Finally, the interface to rollback - atm you have to know before something
> > fails that you'd like to enable rollback for a specific update.  This seems
> > suboptimal, since invariably by the time you know you need rollback, it's
> > too late.  Can we enable a user-initiated rollback from a FAILED state, via
> > one of:
> >
> >   - Introduce a new heat API that allows an explicit heat stack-rollback?
> >   - (ab)use PATCH to trigger rollback on heat stack-update -x --rollback=True?
> In convergence there's no distinction between a rollback and an update 
> using the previous template, so IMHO there's not much need for a 
> separate API.
> > The former approach fits better with the current stack.Stack
> > implementation, because the ROLLBACK stack state already exists.  The
> > latter has the advantage that it doesn't need a new API so might be
> > backportable.
> Convergence does store a copy of the previous template (not 100% sure 
> when it deletes it at the moment - I suspect after the update succeeds), 
> so a rollback API would be feasible if we decided we needed it. I'd 
> prefer the first approach if so.
> > Any thoughts on how we might proceed to make this situation better, and
> > enable folks to roll back in the least destructive way possible when they
> > end up in a FAILED state?
> Note that the root cause of this problem is that Heat doesn't have a 
> global view of dependencies across stacks - if it did it would never 
> have tried to delete the subnet with ports still in it. For the benefit 
> of those who weren't at the design summit, we discussed potential fixes 
> there:
> https://etherpad.openstack.org/p/mitaka-heat-break-stack-barrier
> cheers,
> Zane.
> > Steve
> >
> > [1] https://github.com/openstack/heat/blob/master/heat/engine/stack.py#L1331
> > [2] https://github.com/openstack/heat/blob/master/heat/engine/stack.py#L1143
> >
> > __________________________________________________________________________
> > OpenStack Development Mailing List (not for usage questions)
> > Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >

More information about the OpenStack-dev mailing list