[openstack-dev] [Heat] Convergence proof-of-concept showdown
Clint Byrum
clint at fewbar.com
Sat Jan 10 00:28:48 UTC 2015
Excerpts from Zane Bitter's message of 2015-01-09 14:57:21 -0800:
> On 08/01/15 05:39, Anant Patil wrote:
> > 1. The stack was failing when there were single disjoint resources or
> > just one resource in template. The graph did not include this resource
> > due to a minor bug in dependency_names(). I have added a test case and
> > fix here:
> > https://github.com/anantpatil/heat-convergence-prototype/commit/b58abd77cf596475ecf3f19ed38adf8ad3bb6b3b
>
> Thanks, sorry about that! I will push a patch to fix it up.
>
> > 2. The resource graph is created with keys in both forward order
> > traversal and reverse order traversal and the update will finish the
> > forward order and attempt the reverse order. If this is the case, then
> > the update-replaced resources will be deleted before the update is
> > complete and if the update fails, the old resource is not available for
> > roll-back; a new resource has to be created then. I have added a test
> > case at the above mentioned location.
> >
> > In our PoC, the updates (concurrent updates) won't remove a
> > update-replaced resource until all the resources are updated, and
> > resource clean-up phase is started.
>
> Hmmm, this is a really interesting question actually. That's certainly
> not how Heat works at the moment; we've always assumed that rollback is
> "best-effort" at recovering the exact resources you had before. It would
> be great to have users weigh in on how they expect this to behave. I'm
> curious now what CloudFormation does.
>
> I'm reluctant to change it though because I'm pretty sure this is
> definitely *not* how you would want e.g. a rolling update of an
> autoscaling group to happen.
>
> > It is unacceptable to remove the old
> > resource to be rolled-back to since it may have changes which the user
> > doesn't want to loose;
>
> If they didn't want to lose it they shouldn't have tried an update that
> would replace it. If an update causes a replacement or an interruption
> to service then I consider the same fair game for the rollback - the
> user has already given us permission for that kind of change. (Whether
> the user's consent was informed is a separate question, addressed by
> Ryan's update-preview work.)
>
In the original vision we had for using scaled groups to manage, say,
nova-compute nodes, you definitely can't "create" new servers, so you
can't just create all the new instances without de-allocating some.
That said, thats why we are using in-place methods like rebuild.
I think it would be acceptable to have cleanup run asynchronously,
and to have rollback re-create anything that has already been cleaned up.
More information about the OpenStack-dev
mailing list