[openstack-dev] [Heat] Convergence proof-of-concept showdown

Zane Bitter zbitter at redhat.com
Thu Dec 18 02:12:22 UTC 2014

On 17/12/14 13:05, Gurjar, Unmesh wrote:
>> I'm storing a tuple of its name and database ID. The data structure is
>> resource.GraphKey. I was originally using the name for something, but I
>> suspect I could probably drop it now and just store the database ID, but I
>> haven't tried it yet. (Having the name in there definitely makes debugging
>> more pleasant though ;)
> I agree, having name might come in handy while debugging!
>> When I build the traversal graph each node is a tuple of the GraphKey and a
>> boolean to indicate whether it corresponds to an update or a cleanup
>> operation (both can appear for a single resource in the same graph).
> Just to confirm my understanding, cleanup operation takes care of both:
> 1. resources which are deleted as a part of update and
> 2. previous versioned resource which was updated by replacing with a new
> resource (UpdateReplace scenario)

Yes, correct. Also:

3. resource versions which failed to delete for whatever reason on a 
previous traversal

> Also, the cleanup operation is performed after the update completes successfully.

NO! They are not separate things!


>>> If I am correct, you are updating all resources on update regardless
>>> of their change which will be inefficient if stack contains a million resource.
>> I'm calling update() on all resources regardless of change, but update() will
>> only call handle_update() if something has changed (unless the plugin has
>> overridden Resource._needs_update()).
>> There's no way to know whether a resource needs to be updated before
>> you're ready to update it, so I don't think of this as 'inefficient', just 'correct'.
>>> We have similar questions regarding other areas in your
>>> implementation, which we believe if we understand the outline of your
>>> implementation. It is difficult to get a hold on your approach just by looking
>> at code. Docs strings / Etherpad will help.
>>> About streams, Yes in a million resource stack, the data will be huge, but
>> less than template.
>> No way, it's O(n^3) (cubed!) in the worst case to store streams for each
>> resource.
>>> Also this stream is stored
>>> only In IN_PROGRESS resources.
>> Now I'm really confused. Where does it come from if the resource doesn't
>> get it until it's already in progress? And how will that information help it?
> When an operation on stack is initiated, the stream will be identified.

OK, this may be one of the things I was getting confused about - I 
though a 'stream' belonged to one particular resource and just contained 
all of the paths to reaching that resource. But here it seems like 
you're saying that a 'stream' is a representation of the entire graph? 
So it's essentially just a gratuitously bloated NIH serialisation of the 
Dependencies graph?

> To begin
> the operation, the action is initiated on the leaf (or root) resource(s) and the
> stream is stored (only) in this/these IN_PROGRESS resource(s).

How does that work? Does it get deleted again when the resource moves to 

> The stream should then keep getting passed to the next/previous level of resource(s) as
> and when the dependencies for the next/previous level of resource(s) are met.

That sounds... identical to the way it's implemented in my prototype 
(passing a serialisation of the graph down through the notification 
triggers), except for the part about storing it in the Resource table. 
Why would we persist to the database data that we only need for the 
duration that we already have it in memory anyway?

If we're going to persist it we should do so once, in the Stack table, 
at the time that we're preparing to start the traversal.

>>> The reason to have entire dependency list to reduce DB queries while a
>> stack update.
>> But we never need to know that. We only need to know what just happened
>> and what to do next.
> As mentioned earlier, each level of resources in a graph pass on the dependency
> list/stream to their next/previous level of resources. This is information should further
> be used to determine what is to be done next and drive the operation to completion.

Why would we store *and* forward?

>>> When you have a singular dependency on each resources similar to your
>>> implantation, then we will end up loading Dependencies one at a time and
>> altering almost all resource's dependency regardless of their change.
>>> Regarding a 2 template approach for delete, it is not actually 2
>>> different templates. Its just that we have a delete stream To be taken up
>> post update.
>> That would be a regression from Heat's current behaviour, where we start
>> cleaning up resources as soon as they have nothing depending on them.
>> There's not even a reason to make it worse than what we already have,
>> because it's actually a lot _easier_ to treat update and clean up as the same
>> kind of operation and throw both into the same big graph. The dual
>> implementations and all of the edge cases go away and you can just trust in
>> the graph traversal to do the Right Thing in the most parallel way possible.
>>> (Any post operation will be handled as an update) This approach is
>>> True when Rollback==True We can always fall back to regular stream
>>> (non-delete stream) if Rollback=False
>> I don't understand what you're saying here.
> Just to elaborate, in case of update with rollback, there will be 2 streams of
> operations:

There really should not be.

> 1. first is the create and update resource stream
> 2. second is the stream for deleting resources (which will be taken if the first stream
> completes successfully).

We don't want to break it up into discrete steps. We want to treat it as 
one single graph traversal - provided we set up the dependencies 
correctly then the most optimal behaviour just falls out of our graph 
traversal algorithm for free.

In the existing Heat code I linked above, we use actual 
heat.engine.resource.Resource objects as nodes in the dependency graph 
and rely on figuring out whether they came from the old or new stack to 
differentiate them. However, it's not possible (or desirable) to 
serialise a graph containing those objects and send it to another 
process, so in my convergence prototype I use (key, direction) tuples as 
the nodes so that the same key may appear twice in the graph with 
different 'directions' (forward=True for update, =False for cleanup - 
note that the direction is with respect to the template... as far as the 
graph is concerned it's a single traversal going in one direction).

Artificially dividing things into separate update and cleanup phases is 
both more complicated code to maintain and a major step backwards for 
our users.

I want to be very clear about this: treating the updates and cleanups as 
separate, serial tasks is a -2 show stopper for any convergence design. 
We *MUST* NOT do that to our users.

> However, in case of an update without rollback, there will be a single stream of
> operation (no delete/cleanup stream required).

By 'update without rollback' I assume you mean when the user issues an 
update with disable_rollback=True?

Actually it doesn't matter what you mean, because there is no way of 
interpreting this that could make it correct. We *always* need to check 
all of the pre-existing resources for clean up. The only exception is on 
create, and then only because the set of pre-existing resources is empty.

If your plan for handling UpdateReplace when rollback is disabled is 
just to delete the old resource at the same time as creating the new 
one, then your plan won't work because the dependencies are backwards. 
And leaving the replaced resources around without even trying to clean 
them up would be outright unethical, given how much money it would cost 
our users. So -2 on 'no cleanup when rollback disabled' as well.


More information about the OpenStack-dev mailing list