[openstack-dev] [Heat] Convergence: Backing up template instead of stack

Zane Bitter zbitter at redhat.com
Tue Sep 23 18:32:50 UTC 2014


On 23/09/14 09:44, Anant Patil wrote:
> On 23-Sep-14 09:42, Clint Byrum wrote:
>> Excerpts from Angus Salkeld's message of 2014-09-22 20:15:43 -0700:
>>> On Tue, Sep 23, 2014 at 1:09 AM, Anant Patil <anant.patil at hp.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> One of the steps in the direction of convergence is to enable Heat
>>>> engine to handle concurrent stack operations. The main convergence spec
>>>> talks about it. Resource versioning would be needed to handle concurrent
>>>> stack operations.
>>>>
>>>> As of now, while updating a stack, a backup stack is created with a new
>>>> ID and only one update runs at a time. If we keep the raw_template
>>>> linked to it's previous completed template, i.e. have a back up of
>>>> template instead of stack, we avoid having backup of stack.
>>>>
>>>> Since there won't be a backup stack and only one stack_id to be dealt
>>>> with, resources and their versions can be queried for a stack with that
>>>> single ID. The idea is to identify resources for a stack by using stack
>>>> id and version. Please let me know your thoughts.
>>>>
>>>
>>> Hi Anant,
>>>
>>> This seems more complex than it needs to be.
>>>
>>> I could be wrong, but I thought the aim was to simply update the goal state.
>>> The backup stack is just the last working stack. So if you update and there
>>> is already an update you don't need to touch the backup stack.
>>>
>>> Anyone else that was at the meetup want to fill us in?
>>>
>>
>> The backup stack is a device used to collect items to operate on after
>> the current action is complete. It is entirely an implementation detail.
>>
>> Resources that can be updated in place will have their resource record
>> superseded, but retain their physical resource ID.
>>
>> This is one area where the resource plugin API is particularly sticky,
>> as resources are allowed to raise the "replace me" exception if in-place
>> updates fail. That is o-k though, at that point we will just comply by
>> creating a replacement resource as if we never tried the in-place update.
>>
>> In order to facilitate this, we must expand the resource data model to
>> include a version. Replacement resources will be marked as "current" and
>> to-be-removed resources marked for deletion. We can also keep all current
>> - 1 resources around to facilitate rollback until the stack reaches a
>> "complete" state again. Once that is done, we can remove the backup stack.
>>
>> _______________________________________________
>> OpenStack-dev mailing list
>> OpenStack-dev at lists.openstack.org
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>
> Backup stack is a good way to take care of rollbacks or cleanups after
> the stack action is complete. By cleanup I mean the deletion of
> resources that are no longer needed after the new update. It works very
> well when one engine is processing the stack request and the stacks are
> in memory.

It's actually a fairly terrible hack (I wrote it ;)

It doesn't work very well because in practice during an update there are 
dependencies that cross between the real stack and the backup stack (due 
to some resources remaining the same or being updated in place, while 
others are moved to the backup stack ready for replacement). So in the 
event of a failure that we don't completely roll back on the spot, we 
lose some dependency information.

> As a step towards distributing the stack request processing and making
> it fault-tolerant, we need to persist the dependency task graph. The
> backup stack can also be persisted along with the new graph, but then
> the engine has to traverse both the graphs to proceed with the operation
> and later identify the resources to be cleaned-up or rolled back using
> the stack id. There would be many resources for the same stack but
> different stack ids.

Right, yeah this would be a mistake because in reality there is only one 
graph, so that's how we need to model it internally.

> In contrast, when we store the current dependency task graph(from the
> latest request) in DB, and version the resources, we can identify those
> resources that need to be rolled-back or cleaned up after the stack
> operations is done, by comparing their versions. With versioning of
> resources and template, we can avoid creating a deep stack of backup
> stacks. The processing of stack operation can happen from multiple
> engines, and IMHO, it is simpler when all the engines just see one stack
> and versions of resources, instead of seeing many stacks with many
> resources for each stack.

Bingo.

I think all you need to do is record in the resource the particular 
template and set of parameters it was tied to (maybe just generate a 
UUID for each update... or perhaps a SHA hash of the actual data for 
better rollbacks?). Then any resource that isn't part of the latest 
template should get deleted during the cleanup phase of the dependency 
graph traversal.

As you mentioned above, we'll also need to store the dependency graph of 
the stack in the database somewhere. Right now we generate it afresh 
from the template by assuming that each resource name corresponds to one 
entry in the DB. Since that will no longer be true, we'll need it to be 
a graph of resource IDs that we store.

cheers,
Zane.



More information about the OpenStack-dev mailing list