[openstack-dev] [nova] Distributed locking

Angus Lees guslees at gmail.com
Mon Jun 16 05:40:57 UTC 2014


On Fri, 13 Jun 2014 09:40:30 AM Matthew Booth wrote:
> On 12/06/14 21:38, Joshua Harlow wrote:
> > So just a few thoughts before going to far down this path,
> > 
> > Can we make sure we really really understand the use-case where we think
> > this is needed. I think it's fine that this use-case exists, but I just
> > want to make it very clear to others why its needed and why distributing
> > locking is the only *correct* way.
> 
> An example use of this would be side-loading an image from another
> node's image cache rather than fetching it from glance, which would have
> very significant performance benefits in the VMware driver, and possibly
> other places. The copier must take a read lock on the image to prevent
> the owner from ageing it during the copy. Holding a read lock would also
> assure the copier that the image it is copying is complete.

For this particular example, taking a lock every time seems expensive.  An 
alternative would be to just try to read from another node, and if the result 
wasn't complete+valid for whatever reason then fallback to reading from 
glance.

> > * What happens when a node goes down that owns the lock, how does the
> > software react to this?
> 
> This can be well defined according to the behaviour of the backend. For
> example, it is well defined in zookeeper when a node's session expires.
> If the lock holder is no longer a valid node, it would be fenced before
> deleting its lock, allowing other nodes to continue.
> 
> Without fencing it would not be possible to safely continue in this case.

So I'm sorry for explaining myself poorly in my earlier post.  I think you've 
just described waiting for the lock to expire before another node can take it, 
which is just a regular lock behaviour.  What additional steps do you want 
Fence() to perform at this point?

(I can see if the resource provider had some form of fencing, then it could do 
all sorts of additional things - but I gather your original use case is 
exactly where that *isn't* an option)


"If the lock was allowed to go stale and not released cleanly, then we should 
forcibly reboot the stale instance before allowing the lock to be held again" 
shouldn't be too hard to add.

- Is this just rebooting the instance sufficient for similar situations or would 
we need configurable "actions"?
- Which bot do we trust to issue the reboot command?

>From the locking service pov, I can think of several ways to implement this, 
so we probably want to export a high-level operation and allow the details to 
vary to suit the underlying locking implementation.

-- 
 - Gus



More information about the OpenStack-dev mailing list