[openstack-dev] [Heat] Convergence proof-of-concept showdown

Murugan, Visnusaran visnusaran.murugan at hp.com
Fri Dec 12 11:40:03 UTC 2014


Hi zaneb,

Etherpad updated. 

https://etherpad.openstack.org/p/execution-stream-and-aggregator-based-convergence

> -----Original Message-----
> From: Murugan, Visnusaran
> Sent: Friday, December 12, 2014 4:00 PM
> To: OpenStack Development Mailing List (not for usage questions)
> Subject: Re: [openstack-dev] [Heat] Convergence proof-of-concept
> showdown
> 
> 
> 
> > -----Original Message-----
> > From: Zane Bitter [mailto:zbitter at redhat.com]
> > Sent: Friday, December 12, 2014 6:37 AM
> > To: openstack-dev at lists.openstack.org
> > Subject: Re: [openstack-dev] [Heat] Convergence proof-of-concept
> > showdown
> >
> > On 11/12/14 08:26, Murugan, Visnusaran wrote:
> > >>> [Murugan, Visnusaran]
> > >>> In case of rollback where we have to cleanup earlier version of
> > >>> resources,
> > >> we could get the order from old template. We'd prefer not to have a
> > >> graph table.
> > >>
> > >> In theory you could get it by keeping old templates around. But
> > >> that means keeping a lot of templates, and it will be hard to keep
> > >> track of when you want to delete them. It also means that when
> > >> starting an update you'll need to load every existing previous
> > >> version of the template in order to calculate the dependencies. It
> > >> also leaves the dependencies in an ambiguous state when a resource
> > >> fails, and although that can be worked around it will be a giant pain to
> implement.
> > >>
> > >
> > > Agree that looking to all templates for a delete is not good. But
> > > baring Complexity, we feel we could achieve it by way of having an
> > > update and a delete stream for a stack update operation. I will
> > > elaborate in detail in the etherpad sometime tomorrow :)
> > >
> > >> I agree that I'd prefer not to have a graph table. After trying a
> > >> couple of different things I decided to store the dependencies in
> > >> the Resource table, where we can read or write them virtually for
> > >> free because it turns out that we are always reading or updating
> > >> the Resource itself at exactly the same time anyway.
> > >>
> > >
> > > Not sure how this will work in an update scenario when a resource
> > > does not change and its dependencies do.
> >
> > We'll always update the requirements, even when the properties don't
> > change.
> >
> 
> Can you elaborate a bit on rollback.  We had an approach with depends_on
> and needed_by columns in ResourceTable. But dropped it when we figured
> out we had too many DB operations for Update.
> 
> > > Also taking care of deleting resources in order will be an issue.
> >
> > It works fine.
> >
> > > This implies that there will be different versions of a resource
> > > which will even complicate further.
> >
> > No it doesn't, other than the different versions we already have due
> > to UpdateReplace.
> >
> > >>>> This approach reduces DB queries by waiting for completion
> > >>>> notification
> > >> on a topic. The drawback I see is that delete stack stream will be
> > >> huge as it will have the entire graph. We can always dump such data
> > >> in ResourceLock.data Json and pass a simple flag
> > >> "load_stream_from_db" to converge RPC call as a workaround for
> > >> delete
> > operation.
> > >>>
> > >>> This seems to be essentially equivalent to my 'SyncPoint'
> > >>> proposal[1], with
> > >> the key difference that the data is stored in-memory in a Heat
> > >> engine rather than the database.
> > >>>
> > >>> I suspect it's probably a mistake to move it in-memory for similar
> > >>> reasons to the argument Clint made against synchronising the
> > >>> marking off
> > >> of dependencies in-memory. The database can handle that and the
> > >> problem of making the DB robust against failures of a single
> > >> machine has already been solved by someone else. If we do it
> > >> in-memory we are just creating a single point of failure for not
> > >> much gain. (I guess you could argue it doesn't matter, since if any
> > >> Heat engine dies during the traversal then we'll have to kick off
> > >> another one anyway, but it does limit our options if that changes
> > >> in the
> > >> future.) [Murugan, Visnusaran] Resource completes, removes itself
> > >> from resource_lock and notifies engine. Engine will acquire parent
> > >> lock and initiate parent only if all its children are satisfied (no
> > >> child entry in
> > resource_lock).
> > >> This will come in place of Aggregator.
> > >>
> > >> Yep, if you s/resource_lock/SyncPoint/ that's more or less exactly
> > >> what I
> > did.
> > >> The three differences I can see are:
> > >>
> > >> 1) I think you are proposing to create all of the sync points at
> > >> the start of the traversal, rather than on an as-needed basis. This
> > >> is probably a good idea. I didn't consider it because of the way my
> > >> prototype evolved, but there's now no reason I can see not to do this.
> > >> If we could move the data to the Resource table itself then we
> > >> could even get it for free from an efficiency point of view.
> > >
> > > +1. But we will need engine_id to be stored somewhere for recovery
> > purpose (easy to be queried format).
> >
> > Yeah, so I'm starting to think you're right, maybe the/a Lock table is
> > the right thing to use there. We could probably do it within the
> > resource table using the same select-for-update to set the engine_id,
> > but I agree that we might be starting to jam too much into that one table.
> >
> 
> yeah. Unrelated values in resource table. Upon resource completion we
> have to unset engine_id as well as compared to dropping a row from
> resource lock.
> Both are good. Having engine_id in resource_table will reduce db operaions
> in half. We should go with just resource table along with engine_id.
> 
> > > Sync points are created as-needed. Single resource is enough to
> > > restart
> > that entire stream.
> > > I think there is a disconnect in our understanding. I will detail it
> > > as well in
> > the etherpad.
> >
> > OK, that would be good.
> >
> > >> 2) You're using a single list from which items are removed, rather
> > >> than two lists (one static, and one to which items are added) that
> > >> get
> > compared.
> > >> Assuming (1) then this is probably a good idea too.
> > >
> > > Yeah. We have a single list per active stream which work by removing
> > > Complete/satisfied resources from it.
> >
> > I went to change this and then remembered why I did it this way: the
> > sync point is also storing data about the resources that are
> > triggering it. Part of this is the RefID and attributes, and we could
> > replace that by storing that data in the Resource itself and querying
> > it rather than having it passed in via the notification. But the other
> > part is the ID/key of those resources, which we _need_ to know in
> > order to update the requirements in case one of them has been replaced
> > and thus the graph doesn't reflect it yet. (Or, for that matter, we
> > need it to know where to go looking for the RefId and/or attributes if
> > they're in the
> > DB.) So we have to store some data, we can't just remove items from
> > the required list (although we could do that as well).
> >
> > >> 3) You're suggesting to notify the engine unconditionally and let
> > >> the engine decide if the list is empty. That's probably not a good
> > >> idea - not only does it require extra reads, it introduces a race
> > >> condition that you then have to solve (it can be solved, it's just more
> work).
> > >> Since the update to remove a child from the list is atomic, it's
> > >> best to just trigger the engine only if the list is now empty.
> > >>
> > >
> > > No. Notify only if stream has something to be processed. The newer
> > > Approach based on db lock will be that the last resource will
> > > initiate its
> > parent.
> > > This is opposite to what our Aggregator model had suggested.
> >
> > OK, I think we're on the same page on this one then.
> >
> 
> 
> Yeah.
> 
> > >>> It's not clear to me how the 'streams' differ in practical terms
> > >>> from just passing a serialisation of the Dependencies object,
> > >>> other than being incomprehensible to me ;). The current
> > >>> Dependencies implementation
> > >>> (1) is a very generic implementation of a DAG, (2) works and has
> > >>> plenty of
> > >> unit tests, (3) has, with I think one exception, a pretty
> > >> straightforward API,
> > >> (4) has a very simple serialisation, returned by the edges()
> > >> method, which can be passed back into the constructor to recreate
> > >> it, and (5) has an API that is to some extent relied upon by
> > >> resources, and so won't likely be removed outright in any event.
> > >>> Whatever code we need to handle dependencies ought to just build
> > >>> on
> > >> this existing implementation.
> > >>> [Murugan, Visnusaran] Our thought was to reduce payload size
> > >> (template/graph). Just planning for worst case scenario (million
> > >> resource
> > >> stack) We could always dump them in ResourceLock.data to be loaded
> > >> by Worker.
> > >>
> > >> If there's a smaller representation of a graph than a list of edges
> > >> then I don't know what it is. The proposed stream structure
> > >> certainly isn't it, unless you mean as an alternative to storing
> > >> the entire graph once for each resource. A better alternative is to
> > >> store it once centrally - in my current implementation it is passed
> > >> down through the trigger messages, but since only one traversal can
> > >> be in progress at a time it could just as easily be stored in the
> > >> Stack table of the
> > database at the slight cost of an extra write.
> > >>
> > >
> > > Agree that edge is the smallest representation of a graph. But it
> > > does not give us a complete picture without doing a DB lookup. Our
> > > assumption was to store streams in IN_PROGRESS resource_lock.data
> > > column. This could be in resource table instead.
> >
> > That's true, but I think in practice at any point where we need to
> > look at this we will always have already loaded the Stack from the DB
> > for some other reason, so we actually can get it for free. (See
> > detailed discussion in my reply to Anant.)
> >
> 
> Aren't we planning to stop loading stack with all resource objects in future to
> Address scalability concerns we currently have?
> 
> > >> I'm not opposed to doing that, BTW. In fact, I'm really interested
> > >> in your input on how that might help make recovery from failure
> > >> more robust. I know Anant mentioned that not storing enough data to
> > >> recover when a node dies was his big concern with my current
> approach.
> > >>
> > >
> > > With streams, We feel recovery will be easier. All we need is a
> > > trigger :)
> > >
> > >> I can see that by both creating all the sync points at the start of
> > >> the traversal and storing the dependency graph in the database
> > >> instead of letting it flow through the RPC messages, we would be
> > >> able to resume a traversal where it left off, though I'm not sure
> > >> what that buys
> > us.
> > >>
> > >> And I guess what you're suggesting is that by having an explicit
> > >> lock with the engine ID specified, we can detect when a resource is
> > >> stuck in IN_PROGRESS due to an engine going down? That's actually
> > >> pretty
> > interesting.
> > >>
> > >
> > > Yeah :)
> > >
> > >>> Based on our call on Thursday, I think you're taking the idea of
> > >>> the Lock
> > >> table too literally. The point of referring to locks is that we can
> > >> use the same concepts as the Lock table relies on to do atomic
> > >> updates on a particular row of the database, and we can use those
> > >> atomic updates to prevent race conditions when implementing
> > >> SyncPoints/Aggregators/whatever you want to call them. It's not
> > >> that we'd actually use the Lock table itself, which implements a
> > >> mutex and therefore offers only a much slower and more stateful way
> > >> of doing what we want (lock mutex, change data, unlock mutex).
> > >>> [Murugan, Visnusaran] Are you suggesting something like a
> > >>> select-for-
> > >> update in resource table itself without having  a lock table?
> > >>
> > >> Yes, that's exactly what I was suggesting.
> > >
> > > DB is always good for sync. But we need to be careful not to overdo it.
> >
> > Yeah, I see what you mean now, it's starting to _feel_ like there'd be
> > too many things mixed together in the Resource table. Are you aware of
> > some concrete harm that might cause though? What happens if we overdo
> > it? Is select-for-update on a huge row more expensive than the whole
> > overhead of manipulating the Lock?
> >
> > Just trying to figure out if intuition is leading me astray here.
> >
> 
> You are right. There should be no difference apart from little bump In
> memory usage. But I think it should be fine.
> 
> > > Will update etherpad by tomorrow.
> >
> > OK, thanks.
> >
> > cheers,
> > Zane.
> >
> > _______________________________________________
> > OpenStack-dev mailing list
> > OpenStack-dev at lists.openstack.org
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



More information about the OpenStack-dev mailing list