[openstack-dev] [Heat] Convergence proof-of-concept showdown

Zane Bitter zbitter at redhat.com
Thu Dec 4 05:19:39 UTC 2014

On 01/12/14 02:02, Anant Patil wrote:
> On GitHub:https://github.com/anantpatil/heat-convergence-poc

I'm trying to review this code at the moment, and finding some stuff I 
don't understand:


This appears to loop through all of the resources *prior* to kicking off 
any actual updates to check if the resource will change. This is 
impossible to do in general, since a resource may obtain a property 
value from an attribute of another resource and there is no way to know 
whether an update to said other resource would cause a change in the 
attribute value.

In addition, no attempt to catch UpdateReplace is made. Although that 
looks like a simple fix, I'm now worried about the level to which this 
code has been tested.

I'm also trying to wrap my head around how resources are cleaned up in 
dependency order. If I understand correctly, you store in the 
ResourceGraph table the dependencies between various resource names in 
the current template (presumably there could also be some left around 
from previous templates too?). For each resource name there may be a 
number of rows in the Resource table, each with an incrementing version. 
As far as I can tell though, there's nowhere that the dependency graph 
for _previous_ templates is persisted? So if the dependency order 
changes in the template we have no way of knowing the correct order to 
clean up in any more? (There's not even a mechanism to associate a 
resource version with a particular template, which might be one avenue 
by which to recover the dependencies.)

I think this is an important case we need to be able to handle, so I 
added a scenario to my test framework to exercise it and discovered that 
my implementation was also buggy. Here's the fix: 

> It was difficult, for me personally, to completely understand Zane's PoC
> and how it would lay the foundation for aforementioned design goals. It
> would be very helpful to have Zane's understanding here. I could
> understand that there are ideas like async message passing and notifying
> the parent which we also subscribe to.

So I guess the thing to note is that there are essentially two parts to 
my Poc:
1) A simulation framework that takes what will be in the final 
implementation multiple tasks running in parallel in separate processes 
and talking to a database, and replaces it with an event loop that runs 
the tasks sequentially in a single process with an in-memory data store. 
I could have built a more realistic simulator using Celery or something, 
but I preferred this way as it offers deterministic tests.
2) A toy implementation of Heat on top of this framework.

The files map roughly to Heat something like this:

converge.engine       -> heat.engine.service
converge.stack        -> heat.engine.stack
converge.resource     -> heat.engine.resource
converge.template     -> heat.engine.template
converge.dependencies -> actually is heat.engine.dependencies
converge.sync_point   -> no equivalent
converge.converger    -> no equivalent (this is convergence "worker")
converge.reality      -> represents the actual OpenStack services

For convenience, I just use the @asynchronous decorator to turn an 
ordinary method call into a simulated message.

The concept is essentially as follows:
At the start of a stack update (creates and deletes are also just 
updates) we create any new resources in the DB calculate the dependency 
graph for the update from the data in the DB and template. This graph is 
the same one used by updates in Heat currently, so it contains both the 
forward and reverse (cleanup) dependencies. The stack update then kicks 
off checks of all the leaf nodes, passing the pre-calculated dependency 

Each resource check may result in a call to the create(), update() or 
delete() methods of a Resource plugin. The resource also reads any 
attributes that will be required from it. Once this is complete, it 
triggers any dependent resources that are ready, or updates a SyncPoint 
in the database if there are dependent resources that have multiple 
requirements. The message triggering the next resource will contain the 
dependency graph again, as well as the RefIds and required attributes of 
any resources it depends on.

The new dependencies thus created are added to the resource itself in 
the database at the time it is checked, allowing it to record the 
changes caused by a requirement being unexpectedly replaced without 
needing a global lock on anything.

When cleaning up resources, we also endeavour to remove any that are 
successfully deleted from the dependencies graph.

Each traversal has a unique ID that is both stored in the stack and 
passed down through the resource check triggers. (At present this is the 
template ID, but it may make more sense to have a unique ID since old 
template IDs can be resurrected in the case of a rollback.) As soon as 
these fail to match the resource checks stop propagating, so only an 
update of a single field is required (rather than locking an entire 
table) before beginning a new stack update.

Hopefully that helps a little. Please let me know if you have specific 
questions. I'm *very* happy to incorporate other ideas into it, since 
it's pretty quick to change, has tests to check for regressions, and is 
intended to be thrown away anyhow (so I genuinely don't care if some 
bits get thrown away earlier than others).

> In retrospective, we had to struggle a lot to understand the existing
> Heat engine. We couldn't have done justice by just creating another
> project in GitHub and without any concrete understanding of existing
> state-of-affairs.

I completely agree, and you guys did the right thing by starting out 
looking at Heat. But remember, the valuable thing isn't the code, it's 
what you learned. My concern is that now that you have Heat pretty well 
figured out, you won't be able to continue to learn nearly as fast 
trying to wrestle with the Heat codebase as you could with the 
simulator. We don't want to fall into the trap of just shipping whatever 
we have because it's too hard to explore the other options, we want to 
identify a promising design and iterate it as quickly as possible.


More information about the OpenStack-dev mailing list