[openstack-dev] [Heat] Convergence proof-of-concept showdown

Anant Patil anant.patil at hp.com
Thu Dec 11 06:14:19 UTC 2014

On 04-Dec-14 10:49, Zane Bitter wrote:
> On 01/12/14 02:02, Anant Patil wrote:
>> On GitHub:https://github.com/anantpatil/heat-convergence-poc
> I'm trying to review this code at the moment, and finding some stuff I 
> don't understand:
> https://github.com/anantpatil/heat-convergence-poc/blob/master/heat/engine/stack.py#L911-L916
> This appears to loop through all of the resources *prior* to kicking off 
> any actual updates to check if the resource will change. This is 
> impossible to do in general, since a resource may obtain a property 
> value from an attribute of another resource and there is no way to know 
> whether an update to said other resource would cause a change in the 
> attribute value.
> In addition, no attempt to catch UpdateReplace is made. Although that 
> looks like a simple fix, I'm now worried about the level to which this 
> code has been tested.
We were working on new branch and as we discussed on Skype, we have
handled all these cases. Please have a look at our current branch:

When a new resource is taken for convergence, its children are loaded
and the resource definition is re-parsed. The frozen resource definition
will have all the "get_attr" resolved.

> I'm also trying to wrap my head around how resources are cleaned up in 
> dependency order. If I understand correctly, you store in the 
> ResourceGraph table the dependencies between various resource names in 
> the current template (presumably there could also be some left around 
> from previous templates too?). For each resource name there may be a 
> number of rows in the Resource table, each with an incrementing version. 
> As far as I can tell though, there's nowhere that the dependency graph 
> for _previous_ templates is persisted? So if the dependency order 
> changes in the template we have no way of knowing the correct order to 
> clean up in any more? (There's not even a mechanism to associate a 
> resource version with a particular template, which might be one avenue 
> by which to recover the dependencies.)
> I think this is an important case we need to be able to handle, so I 
> added a scenario to my test framework to exercise it and discovered that 
> my implementation was also buggy. Here's the fix: 
> https://github.com/zaneb/heat-convergence-prototype/commit/786f367210ca0acf9eb22bea78fd9d51941b0e40

Thanks for pointing this out Zane. We too had a buggy implementation for
handling inverted dependency. I had a hard look at our algorithm where
we were continuously merging the edges from new template into the edges
from previous updates. It was an optimized way of traversing the graph
in both forward and reverse order with out missing any resources. But,
when the dependencies are inverted,  this wouldn't work.

We have changed our algorithm. The changes in edges are noted down in
DB, only the delta of edges from previous template is calculated and
kept. At any given point of time, the graph table has all the edges from
current template and delta from previous templates. Each edge has
template ID associated with it. For resource clean up, we start from the
first template (template which was completed and updates were made on
top of it, empty template otherwise), and move towards the current
template in the order in which the updates were issued, and for each
template the graph (edges if found for the template) is traversed in
reverse order and resources are cleaned-up. The process ends up with
current template being traversed in reverse order and resources being
cleaned up. All the update-replaced resources from the older templates
(older updates in concurrent updates) are cleaned up in the order in
which they are suppose to be.

Resources are now tied to template, they have a template_id instead of
version. As we traverse the graph, we know which template we are working
on, and can take the relevant action on resource.

For rollback, another update is issued with the last completed template
(it is designed to have an empty template as last completed template by
default). The current template being worked on becomes predecessor for
the new incoming template. In case of rollback, the last completed
template becomes incoming new template, the current becomes the new
template's predecessor and the successor of last completed template will
have no predecessor. All these changes are available in the
'graph-version' branch. (The branch name is a misnomer though!)

I think it is simpler to think about stack and concurrent updates when
we associate resources and edges with template, and stack with current
template and its predecessors (if any).

I also think that we should decouple Resource from Stack. This is really
a hindrance when workers work on individual resources. The resource
should be abstracted enough from stack for the worker to work on the
resource alone. The worker should load the required resource plug-in and
start converging.

The READEME.rst is really helpful for bringing up the minimal devstack
and test the PoC. I also has some notes on design.

>> It was difficult, for me personally, to completely understand Zane's PoC
>> and how it would lay the foundation for aforementioned design goals. It
>> would be very helpful to have Zane's understanding here. I could
>> understand that there are ideas like async message passing and notifying
>> the parent which we also subscribe to.
> So I guess the thing to note is that there are essentially two parts to 
> my Poc:
> 1) A simulation framework that takes what will be in the final 
> implementation multiple tasks running in parallel in separate processes 
> and talking to a database, and replaces it with an event loop that runs 
> the tasks sequentially in a single process with an in-memory data store. 
> I could have built a more realistic simulator using Celery or something, 
> but I preferred this way as it offers deterministic tests.
> 2) A toy implementation of Heat on top of this framework.
> The files map roughly to Heat something like this:
> converge.engine       -> heat.engine.service
> converge.stack        -> heat.engine.stack
> converge.resource     -> heat.engine.resource
> converge.template     -> heat.engine.template
> converge.dependencies -> actually is heat.engine.dependencies
> converge.sync_point   -> no equivalent
> converge.converger    -> no equivalent (this is convergence "worker")
> converge.reality      -> represents the actual OpenStack services
> For convenience, I just use the @asynchronous decorator to turn an 
> ordinary method call into a simulated message.
> The concept is essentially as follows:
> At the start of a stack update (creates and deletes are also just 
> updates) we create any new resources in the DB calculate the dependency 
> graph for the update from the data in the DB and template. This graph is 
> the same one used by updates in Heat currently, so it contains both the 
> forward and reverse (cleanup) dependencies. The stack update then kicks 
> off checks of all the leaf nodes, passing the pre-calculated dependency 
> graph.
> Each resource check may result in a call to the create(), update() or 
> delete() methods of a Resource plugin. The resource also reads any 
> attributes that will be required from it. Once this is complete, it 
> triggers any dependent resources that are ready, or updates a SyncPoint 
> in the database if there are dependent resources that have multiple 
> requirements. The message triggering the next resource will contain the 
> dependency graph again, as well as the RefIds and required attributes of 
> any resources it depends on.
> The new dependencies thus created are added to the resource itself in 
> the database at the time it is checked, allowing it to record the 
> changes caused by a requirement being unexpectedly replaced without 
> needing a global lock on anything.
> When cleaning up resources, we also endeavour to remove any that are 
> successfully deleted from the dependencies graph.
> Each traversal has a unique ID that is both stored in the stack and 
> passed down through the resource check triggers. (At present this is the 
> template ID, but it may make more sense to have a unique ID since old 
> template IDs can be resurrected in the case of a rollback.) As soon as 
> these fail to match the resource checks stop propagating, so only an 
> update of a single field is required (rather than locking an entire 
> table) before beginning a new stack update.
> Hopefully that helps a little. Please let me know if you have specific 
> questions. I'm *very* happy to incorporate other ideas into it, since 
> it's pretty quick to change, has tests to check for regressions, and is 
> intended to be thrown away anyhow (so I genuinely don't care if some 
> bits get thrown away earlier than others).
This is of tremendous help for us.
>> In retrospective, we had to struggle a lot to understand the existing
>> Heat engine. We couldn't have done justice by just creating another
>> project in GitHub and without any concrete understanding of existing
>> state-of-affairs.
> I completely agree, and you guys did the right thing by starting out 
> looking at Heat. But remember, the valuable thing isn't the code, it's 
> what you learned. My concern is that now that you have Heat pretty well 
> figured out, you won't be able to continue to learn nearly as fast 
> trying to wrestle with the Heat codebase as you could with the 
> simulator. We don't want to fall into the trap of just shipping whatever 
> we have because it's too hard to explore the other options, we want to 
> identify a promising design and iterate it as quickly as possible.

I would have loved to, especially after the short tutorial given above
:). The framework is great! I am in the middle of using DB transactions
to replace stack lock for critical section. For that I need my devstack
setup with the actual DB running. I liked the test cases (scenario
tests) you have, and I am porting it so that we can run it against our

> cheers,
> Zane.
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Zane, I have few questions:
1. Our current implementation is based on notifications from worker so
that the engine can take up next set of tasks. I don't see this in your
case. I think we should be doing this. It gels well with observer
notification mechanism. When the observer comes, it would send a
converge notification. Both, the provisioning of stack and the
continuous observation, happens with notifications (async message
passing). I see that the workers in your case pick up the parent when/if
it is done and schedules it or updates the sync point.

2. The dependency graph travels everywhere. IMHO, we can keep the graph
in DB and let the workers work on a resource, and engine decide which
one to be scheduled next by looking at the graph. There wouldn't be a
need for a lock here, in the engine, the DB transactions should take
care of concurrent DB updates. Our new PoC follows this model.

3. The request ID is passed down to check_*_complete. Would the check
method be interrupted if new request arrives? IMHO, the check method
should not get interrupted. It should return back when the resource has
reached a concrete state, either failed or completed.

4. Lot of synchronization issues which we faced in our PoC cannot be
encountered with the framework. How do we evaluate what happens when
synchronization issues are encountered (like stack lock kind of issues
which we are replacing with DB transaction).

- Anant

More information about the OpenStack-dev mailing list