[openstack-dev] [nova] Distributed locking
harlowja at yahoo-inc.com
Mon Jun 16 23:28:00 UTC 2014
So this is a reader/write lock then?
I have seen https://github.com/python-zk/kazoo/pull/141 come up in the
kazoo (zookeeper python library) but there was a lack of a maintainer for
that 'recipe', perhaps if we really find this needed we can help get that
pull request 'sponsored' so that it can be used for this purpose?
As far as resiliency, the thing I was thinking about was how correct do u
want this lock to be?
If u say go with memcached and a locking mechanism using it this will not
be correct but it might work good enough under normal usage. So that¹s why
I was wondering about what level of correctness do you want and what do
you want to happen if a server that is maintaining the lock record dies.
In memcaches case this will literally be 1 server, even if sharding is
being used, since a key hashes to one server. So if that one server goes
down (or a network split happens) then it is possible for two entities to
believe they own the same lock (and if the network split recovers this
gets even weirder); so that¹s what I was wondering about when mentioning
resiliency and how much incorrectness you are willing to tolerate.
From: Matthew Booth <mbooth at redhat.com>
Organization: Red Hat
Date: Friday, June 13, 2014 at 1:40 AM
To: Joshua Harlow <harlowja at yahoo-inc.com>, "OpenStack Development Mailing
List (not for usage questions)" <openstack-dev at lists.openstack.org>
Subject: Re: [openstack-dev] [nova] Distributed locking
>On 12/06/14 21:38, Joshua Harlow wrote:
>> So just a few thoughts before going to far down this path,
>> Can we make sure we really really understand the use-case where we think
>> this is needed. I think it's fine that this use-case exists, but I just
>> want to make it very clear to others why its needed and why distributing
>> locking is the only *correct* way.
>An example use of this would be side-loading an image from another
>node's image cache rather than fetching it from glance, which would have
>very significant performance benefits in the VMware driver, and possibly
>other places. The copier must take a read lock on the image to prevent
>the owner from ageing it during the copy. Holding a read lock would also
>assure the copier that the image it is copying is complete.
>> This helps set a good precedent for others that may follow down this
>> that they also clearly explain the situation, how distributed locking
>> fixes it and all the corner cases that now pop-up with distributed
>> Some of the questions that I can think of at the current moment:
>> * What happens when a node goes down that owns the lock, how does the
>> software react to this?
>This can be well defined according to the behaviour of the backend. For
>example, it is well defined in zookeeper when a node's session expires.
>If the lock holder is no longer a valid node, it would be fenced before
>deleting its lock, allowing other nodes to continue.
>Without fencing it would not be possible to safely continue in this case.
>> * What resources are being locked; what is the lock target, what is its
>These are not questions for a locking implementation. A lock would be
>held on a name, and it would be up to the api user to ensure that the
>protected resource is only used while correctly locked, and that the
>lock is not held longer than necessary.
>> * What resiliency do you want this lock to provide (this becomes a
>> critical question when considering memcached, since memcached is not
>> really the best choice for a resilient distributing locking backend)?
>What does resiliency mean in this context? We really just need the lock
>to be correct
>> * What do entities that try to acquire a lock do when they can't acquire
>Typically block, but if a use case emerged for trylock() it would be
>simple to implement. For example, in the image side-loading case we may
>decide that if it isn't possible to immediately acquire the lock it
>isn't worth waiting, and we just fetch it from glance anyway.
>> A useful thing I wrote up a while ago, might still be useful:
>> Feel free to move that wiki if u find it useful (its sorta a high-level
>> doc on the different strategies and such).
>Nice list of implementation pros/cons.
>> -----Original Message-----
>> From: Matthew Booth <mbooth at redhat.com>
>> Organization: Red Hat
>> Reply-To: "OpenStack Development Mailing List (not for usage questions)"
>> <openstack-dev at lists.openstack.org>
>> Date: Thursday, June 12, 2014 at 7:30 AM
>> To: "OpenStack Development Mailing List (not for usage questions)"
>> <openstack-dev at lists.openstack.org>
>> Subject: [openstack-dev] [nova] Distributed locking
>>> We have a need for a distributed lock in the VMware driver, which I
>>> suspect isn't unique. Specifically it is possible for a VMware
>>> to be accessed via multiple nova nodes if it is shared between
>>> clusters. Unfortunately the vSphere API doesn't provide us with the
>>> primitives to implement robust locking using the storage layer itself,
>>> so we're looking elsewhere.
>>> The closest we seem to have in Nova currently are service groups, which
>>> currently have 3 implementations: DB, Zookeeper and Memcached. The
>>> service group api currently provides simple membership, but for locking
>>> we'd be looking for something more.
>>> I think the api we'd be looking for would be something along the lines
>>> Foo.lock(name, fence_info)
>>> Note that fencing would be required in this case. We believe we can
>>> fence by terminating the other Nova's vSphere session, but other
>>> might include killing a Nova process, or STONITH. These would be
>>> implemented as fencing drivers.
>>> Although I haven't worked through the detail, I believe lock and unlock
>>> would be implementable in all 3 of the current service group drivers.
>>> Fencing would be implemented separately.
>>> My questions:
>>> * Does this already exist, or does anybody have patches pending to do
>>> something like this?
>>> * Are there other users for this?
>>> * Would service groups be an appropriate place, or a new distributed
>>> locking class?
>>> * How about if we just used zookeeper directly in the driver?
>>>  Cluster ~= hypervisor
>>> Matthew Booth
>>> Red Hat Engineering, Virtualisation Team
>>> Phone: +442070094448 (UK)
>>> GPG ID: D33C3490
>>> GPG FPR: 3733 612D 2D05 5458 8A8A 1600 3441 EA19 D33C 3490
>>> OpenStack-dev mailing list
>>> OpenStack-dev at lists.openstack.org
>Red Hat Engineering, Virtualisation Team
>Phone: +442070094448 (UK)
>GPG ID: D33C3490
>GPG FPR: 3733 612D 2D05 5458 8A8A 1600 3441 EA19 D33C 3490
More information about the OpenStack-dev