[openstack-dev] [nova] Distributed locking

Matthew Booth mbooth at redhat.com
Fri Jun 13 08:40:30 UTC 2014


On 12/06/14 21:38, Joshua Harlow wrote:
> So just a few thoughts before going to far down this path,
> 
> Can we make sure we really really understand the use-case where we think
> this is needed. I think it's fine that this use-case exists, but I just
> want to make it very clear to others why its needed and why distributing
> locking is the only *correct* way.

An example use of this would be side-loading an image from another
node's image cache rather than fetching it from glance, which would have
very significant performance benefits in the VMware driver, and possibly
other places. The copier must take a read lock on the image to prevent
the owner from ageing it during the copy. Holding a read lock would also
assure the copier that the image it is copying is complete.

> This helps set a good precedent for others that may follow down this path
> that they also clearly explain the situation, how distributed locking
> fixes it and all the corner cases that now pop-up with distributed locking.
> 
> Some of the questions that I can think of at the current moment:
> 
> * What happens when a node goes down that owns the lock, how does the
> software react to this?

This can be well defined according to the behaviour of the backend. For
example, it is well defined in zookeeper when a node's session expires.
If the lock holder is no longer a valid node, it would be fenced before
deleting its lock, allowing other nodes to continue.

Without fencing it would not be possible to safely continue in this case.

> * What resources are being locked; what is the lock target, what is its
> lifetime?

These are not questions for a locking implementation. A lock would be
held on a name, and it would be up to the api user to ensure that the
protected resource is only used while correctly locked, and that the
lock is not held longer than necessary.

> * What resiliency do you want this lock to provide (this becomes a
> critical question when considering memcached, since memcached is not
> really the best choice for a resilient distributing locking backend)?

What does resiliency mean in this context? We really just need the lock
to be correct

> * What do entities that try to acquire a lock do when they can't acquire
> it?

Typically block, but if a use case emerged for trylock() it would be
simple to implement. For example, in the image side-loading case we may
decide that if it isn't possible to immediately acquire the lock it
isn't worth waiting, and we just fetch it from glance anyway.

> A useful thing I wrote up a while ago, might still be useful:
> 
> https://wiki.openstack.org/wiki/StructuredWorkflowLocks
> 
> Feel free to move that wiki if u find it useful (its sorta a high-level
> doc on the different strategies and such).

Nice list of implementation pros/cons.

Matt

> 
> -Josh
> 
> -----Original Message-----
> From: Matthew Booth <mbooth at redhat.com>
> Organization: Red Hat
> Reply-To: "OpenStack Development Mailing List (not for usage questions)"
> <openstack-dev at lists.openstack.org>
> Date: Thursday, June 12, 2014 at 7:30 AM
> To: "OpenStack Development Mailing List (not for usage questions)"
> <openstack-dev at lists.openstack.org>
> Subject: [openstack-dev] [nova] Distributed locking
> 
>> We have a need for a distributed lock in the VMware driver, which I
>> suspect isn't unique. Specifically it is possible for a VMware datastore
>> to be accessed via multiple nova nodes if it is shared between
>> clusters[1]. Unfortunately the vSphere API doesn't provide us with the
>> primitives to implement robust locking using the storage layer itself,
>> so we're looking elsewhere.
>>
>> The closest we seem to have in Nova currently are service groups, which
>> currently have 3 implementations: DB, Zookeeper and Memcached. The
>> service group api currently provides simple membership, but for locking
>> we'd be looking for something more.
>>
>> I think the api we'd be looking for would be something along the lines of:
>>
>> Foo.lock(name, fence_info)
>> Foo.unlock(name)
>>
>> Bar.fence(fence_info)
>>
>> Note that fencing would be required in this case. We believe we can
>> fence by terminating the other Nova's vSphere session, but other options
>> might include killing a Nova process, or STONITH. These would be
>> implemented as fencing drivers.
>>
>> Although I haven't worked through the detail, I believe lock and unlock
>> would be implementable in all 3 of the current service group drivers.
>> Fencing would be implemented separately.
>>
>> My questions:
>>
>> * Does this already exist, or does anybody have patches pending to do
>> something like this?
>> * Are there other users for this?
>> * Would service groups be an appropriate place, or a new distributed
>> locking class?
>> * How about if we just used zookeeper directly in the driver?
>>
>> Matt
>>
>> [1] Cluster ~= hypervisor
>> -- 
>> Matthew Booth
>> Red Hat Engineering, Virtualisation Team
>>
>> Phone: +442070094448 (UK)
>> GPG ID:  D33C3490
>> GPG FPR: 3733 612D 2D05 5458 8A8A 1600 3441 EA19 D33C 3490
>>
>> _______________________________________________
>> OpenStack-dev mailing list
>> OpenStack-dev at lists.openstack.org
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 


-- 
Matthew Booth
Red Hat Engineering, Virtualisation Team

Phone: +442070094448 (UK)
GPG ID:  D33C3490
GPG FPR: 3733 612D 2D05 5458 8A8A 1600 3441 EA19 D33C 3490



More information about the OpenStack-dev mailing list