[openstack-dev] [Heat] Convergence proof-of-concept showdown

Murugan, Visnusaran visnusaran.murugan at hp.com
Wed Dec 10 11:42:25 UTC 2014



-----Original Message-----
From: Zane Bitter [mailto:zbitter at redhat.com] 
Sent: Tuesday, December 9, 2014 3:50 AM
To: openstack-dev at lists.openstack.org
Subject: Re: [openstack-dev] [Heat] Convergence proof-of-concept showdown

On 08/12/14 07:00, Murugan, Visnusaran wrote:
>
> Hi Zane & Michael,
>
> Please have a look @ 
> https://etherpad.openstack.org/p/execution-stream-and-aggregator-based
> -convergence
>
> Updated with a combined approach which does not require persisting graph and backup stack removal.

Well, we still have to persist the dependencies of each version of a resource _somehow_, because otherwise we can't know how to clean them up in the correct order. But what I think you meant to say is that this approach doesn't require it to be persisted in a separate table where the rows are marked as traversed as we work through the graph.

[Murugan, Visnusaran] 
In case of rollback where we have to cleanup earlier version of resources, we could get the order from old template. We'd prefer not to have a graph table.

> This approach reduces DB queries by waiting for completion notification on a topic. The drawback I see is that delete stack stream will be huge as it will have the entire graph. We can always dump such data in ResourceLock.data Json and pass a simple flag "load_stream_from_db" to converge RPC call as a workaround for delete operation.

This seems to be essentially equivalent to my 'SyncPoint' proposal[1], with the key difference that the data is stored in-memory in a Heat engine rather than the database.

I suspect it's probably a mistake to move it in-memory for similar reasons to the argument Clint made against synchronising the marking off of dependencies in-memory. The database can handle that and the problem of making the DB robust against failures of a single machine has already been solved by someone else. If we do it in-memory we are just creating a single point of failure for not much gain. (I guess you could argue it doesn't matter, since if any Heat engine dies during the traversal then we'll have to kick off another one anyway, but it does limit our options if that changes in the future.)
[Murugan, Visnusaran] Resource completes, removes itself from resource_lock and notifies engine. Engine will acquire parent lock and initiate parent only if all its children are satisfied (no child entry in resource_lock). This will come in place of Aggregator.

It's not clear to me how the 'streams' differ in practical terms from just passing a serialisation of the Dependencies object, other than being incomprehensible to me ;). The current Dependencies implementation
(1) is a very generic implementation of a DAG, (2) works and has plenty of unit tests, (3) has, with I think one exception, a pretty straightforward API, (4) has a very simple serialisation, returned by the edges() method, which can be passed back into the constructor to recreate it, and (5) has an API that is to some extent relied upon by resources, and so won't likely be removed outright in any event. 
Whatever code we need to handle dependencies ought to just build on this existing implementation.
[Murugan, Visnusaran] Our thought was to reduce payload size (template/graph). Just planning for worst case scenario (million resource stack) We could always dump them in ResourceLock.data to be loaded by Worker.

I think the difference may be that the streams only include the
*shortest* paths (there will often be more than one) to each resource. i.e.

      A <------- B <------- C
      ^                     |
      |                     |
      +---------------------+

can just be written as:

      A <------- B <------- C

because there's only one order in which that can execute anyway. (If we're going to do this though, we should just add a method to the dependencies.Graph class to delete redundant edges, not create a whole new data structure.) There is a big potential advantage here in that it reduces the theoretical maximum number of edges in the graph from O(n^2) to O(n). (Although in practice real templates are typically not likely to have such dense graphs.)

There's a downside to this too though: say that A in the above diagram is replaced during an update. In that case not only B but also C will need to figure out what the latest version of A is. One option here is to pass that data along via B, but that will become very messy to implement in a non-trivial example. The other would be for C to go search in the database for resources with the same name as A and the current traversal_id marked as the latest. But that not only creates a concurrency problem we didn't have before (A could have been updated with a new traversal_id at some point after C had established that the current traversal was still valid but before it went looking for A), it also eliminates all of the performance gains from removing that edge in the first place.

[1]
https://github.com/zaneb/heat-convergence-prototype/blob/distributed-graph/converge/sync_point.py

> To Stop current stack operation, we will use your traversal_id based approach.

+1 :)
[Murugan, Visnusaran] We had this idea already :)

> If in case you feel Aggregator model creates more queues, then we 
> might have to poll DB to get resource status. (Which will impact 
> performance adversely :) )

For the reasons given above I would vote for doing this in the DB. I agree there will be a performance penalty for doing so, because we'll be paying for robustness.
[Murugan, Visnusaran]  +1

> Lock table: name(Unique - Resource_id), stack_id, engine_id, data 
> (Json to store stream dict)

Based on our call on Thursday, I think you're taking the idea of the Lock table too literally. The point of referring to locks is that we can use the same concepts as the Lock table relies on to do atomic updates on a particular row of the database, and we can use those atomic updates to prevent race conditions when implementing SyncPoints/Aggregators/whatever you want to call them. It's not that we'd actually use the Lock table itself, which implements a mutex and therefore offers only a much slower and more stateful way of doing what we want (lock mutex, change data, unlock mutex).
[Murugan, Visnusaran] Are you suggesting something like a select-for-update in resource table itself without having  a lock table?

cheers,
Zane.

> Your thoughts.
> Vishnu (irc: ckmvishnu)
> Unmesh (irc: unmeshg)


_______________________________________________
OpenStack-dev mailing list
OpenStack-dev at lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



More information about the OpenStack-dev mailing list