[openstack-dev] [Heat] Final steps toward a Convergence design
Zane Bitter
zbitter at redhat.com
Tue Jan 20 01:36:25 UTC 2015
Hi folks,
I'd like to come to agreement on the last major questions of the
convergence design. I well aware that I am the current bottleneck as I
have been struggling to find enough time to make progress on it, but I
think we are now actually very close.
I believe the last remaining issue to be addressed is the question of
what to do when we want to update a resource that is still IN_PROGRESS
as the result of a previous (now cancelled, obviously) update.
There are, of course, a couple of trivial and wrong ways to handle it:
1) Throw UpdateReplace and make a new one
- This is obviously a terrible solution for the user
2) Poll the DB in a loop until the previous update finishes
- This is obviously horribly inefficient
So the preferred solution here needs to involve retriggering the
resource's task in the current update once the one from the previous
update is complete.
I've implemented some changes in the simulator - although note that
unlike stuff I implemented previously, this is extremely poorly tested
(if at all) since the simulator runs the tasks serially and therefore
never hits this case. So code review would be appreciated. I committed
the changes on a new branch, "resumable":
https://github.com/zaneb/heat-convergence-prototype/commits/resumable
Here is a brief summary:
- The SyncPoints are now:
* created for every resource, regardless of how many dependencies it has.
* created at the beginning of an update and deleted before beginning
another update.
* contain only the list of satisfied dependencies (and their RefId
and attribute values).
- The graph is now stored in the Stack table, rather than passed through
the chain of trigger notifications.
- We'll use locks in the Resource table to ensure that only one action
at a time can happen on a Resource.
- When a trigger is received for a resource that is locked (i.e. status
is IN_PROGRESS and the engine owning it is still alive), the trigger is
ignored.
- When processing of a resource completes, a failure to find any of the
sync points that are to be notified (every resource has at least one,
since the last resource in each chain must notify the stack that it is
complete) indicates that the current update has been cancelled and
triggers a new check on the resource with the data for the current
update (retrieved from the Stack table) if it is ready (as indicated by
its SyncPoint entry).
I'm not 100% happy with the amount of extra load this puts on the
database, but I can't see a way to do significantly better and still
solve this locking issue. Suggestions are welcome. At least the common
case is considerably better than the worst case.
There are two races here that we need to satisfy ourselves we have
answers for (I think we do):
1) Since old SyncPoints are deleted before a new transition begins and
we only look for them after unlocking the resource being processed, I
don't believe that both the previous and the new update can fail to
trigger the check on the resource in the new update's traversal. (If
there are any DB experts out there, I'd be interested in their input on
this one.)
2) When both the previous and the new update end up triggering a check
on the resource in the new update's traversal, we'll only perform one
because one will succeed in locking the resource and the other will just
be ignored after it fails to acquire the lock. (This one is watertight,
since both processes are acting on the same lock.)
I believe that this model is very close to what Anant and his team are
proposing. Arguably this means I've been wasting everyone's time, but a
happier way to look at it is that two mostly independent design efforts
converging on a similar solution is something we can take a lot of
confidence from ;)
My next task is to start breaking this down into blueprints that folks
can start implementing. In the meantime, it would be great if we could
identify any remaining discrepancies between the two designs and
completely close those last gaps.
cheers,
Zane.
More information about the OpenStack-dev
mailing list