[openstack-dev] [Heat] Convergence proof-of-concept showdown
Zane Bitter
zbitter at redhat.com
Thu Dec 4 05:19:39 UTC 2014
On 01/12/14 02:02, Anant Patil wrote:
> On GitHub:https://github.com/anantpatil/heat-convergence-poc
I'm trying to review this code at the moment, and finding some stuff I
don't understand:
https://github.com/anantpatil/heat-convergence-poc/blob/master/heat/engine/stack.py#L911-L916
This appears to loop through all of the resources *prior* to kicking off
any actual updates to check if the resource will change. This is
impossible to do in general, since a resource may obtain a property
value from an attribute of another resource and there is no way to know
whether an update to said other resource would cause a change in the
attribute value.
In addition, no attempt to catch UpdateReplace is made. Although that
looks like a simple fix, I'm now worried about the level to which this
code has been tested.
I'm also trying to wrap my head around how resources are cleaned up in
dependency order. If I understand correctly, you store in the
ResourceGraph table the dependencies between various resource names in
the current template (presumably there could also be some left around
from previous templates too?). For each resource name there may be a
number of rows in the Resource table, each with an incrementing version.
As far as I can tell though, there's nowhere that the dependency graph
for _previous_ templates is persisted? So if the dependency order
changes in the template we have no way of knowing the correct order to
clean up in any more? (There's not even a mechanism to associate a
resource version with a particular template, which might be one avenue
by which to recover the dependencies.)
I think this is an important case we need to be able to handle, so I
added a scenario to my test framework to exercise it and discovered that
my implementation was also buggy. Here's the fix:
https://github.com/zaneb/heat-convergence-prototype/commit/786f367210ca0acf9eb22bea78fd9d51941b0e40
> It was difficult, for me personally, to completely understand Zane's PoC
> and how it would lay the foundation for aforementioned design goals. It
> would be very helpful to have Zane's understanding here. I could
> understand that there are ideas like async message passing and notifying
> the parent which we also subscribe to.
So I guess the thing to note is that there are essentially two parts to
my Poc:
1) A simulation framework that takes what will be in the final
implementation multiple tasks running in parallel in separate processes
and talking to a database, and replaces it with an event loop that runs
the tasks sequentially in a single process with an in-memory data store.
I could have built a more realistic simulator using Celery or something,
but I preferred this way as it offers deterministic tests.
2) A toy implementation of Heat on top of this framework.
The files map roughly to Heat something like this:
converge.engine -> heat.engine.service
converge.stack -> heat.engine.stack
converge.resource -> heat.engine.resource
converge.template -> heat.engine.template
converge.dependencies -> actually is heat.engine.dependencies
converge.sync_point -> no equivalent
converge.converger -> no equivalent (this is convergence "worker")
converge.reality -> represents the actual OpenStack services
For convenience, I just use the @asynchronous decorator to turn an
ordinary method call into a simulated message.
The concept is essentially as follows:
At the start of a stack update (creates and deletes are also just
updates) we create any new resources in the DB calculate the dependency
graph for the update from the data in the DB and template. This graph is
the same one used by updates in Heat currently, so it contains both the
forward and reverse (cleanup) dependencies. The stack update then kicks
off checks of all the leaf nodes, passing the pre-calculated dependency
graph.
Each resource check may result in a call to the create(), update() or
delete() methods of a Resource plugin. The resource also reads any
attributes that will be required from it. Once this is complete, it
triggers any dependent resources that are ready, or updates a SyncPoint
in the database if there are dependent resources that have multiple
requirements. The message triggering the next resource will contain the
dependency graph again, as well as the RefIds and required attributes of
any resources it depends on.
The new dependencies thus created are added to the resource itself in
the database at the time it is checked, allowing it to record the
changes caused by a requirement being unexpectedly replaced without
needing a global lock on anything.
When cleaning up resources, we also endeavour to remove any that are
successfully deleted from the dependencies graph.
Each traversal has a unique ID that is both stored in the stack and
passed down through the resource check triggers. (At present this is the
template ID, but it may make more sense to have a unique ID since old
template IDs can be resurrected in the case of a rollback.) As soon as
these fail to match the resource checks stop propagating, so only an
update of a single field is required (rather than locking an entire
table) before beginning a new stack update.
Hopefully that helps a little. Please let me know if you have specific
questions. I'm *very* happy to incorporate other ideas into it, since
it's pretty quick to change, has tests to check for regressions, and is
intended to be thrown away anyhow (so I genuinely don't care if some
bits get thrown away earlier than others).
> In retrospective, we had to struggle a lot to understand the existing
> Heat engine. We couldn't have done justice by just creating another
> project in GitHub and without any concrete understanding of existing
> state-of-affairs.
I completely agree, and you guys did the right thing by starting out
looking at Heat. But remember, the valuable thing isn't the code, it's
what you learned. My concern is that now that you have Heat pretty well
figured out, you won't be able to continue to learn nearly as fast
trying to wrestle with the Heat codebase as you could with the
simulator. We don't want to fall into the trap of just shipping whatever
we have because it's too hard to explore the other options, we want to
identify a promising design and iterate it as quickly as possible.
cheers,
Zane.
More information about the OpenStack-dev
mailing list