[openstack-dev] [nova] Distributed locking
Doug Hellmann
doug.hellmann at dreamhost.com
Tue Jun 17 14:07:23 UTC 2014
On Tue, Jun 17, 2014 at 4:36 AM, Matthew Booth <mbooth at redhat.com> wrote:
> On 17/06/14 00:28, Joshua Harlow wrote:
>> So this is a reader/write lock then?
>>
>> I have seen https://github.com/python-zk/kazoo/pull/141 come up in the
>> kazoo (zookeeper python library) but there was a lack of a maintainer for
>> that 'recipe', perhaps if we really find this needed we can help get that
>> pull request 'sponsored' so that it can be used for this purpose?
>>
>>
>> As far as resiliency, the thing I was thinking about was how correct do u
>> want this lock to be?
>>
>> If u say go with memcached and a locking mechanism using it this will not
>> be correct but it might work good enough under normal usage. So that¹s why
>> I was wondering about what level of correctness do you want and what do
>> you want to happen if a server that is maintaining the lock record dies.
>> In memcaches case this will literally be 1 server, even if sharding is
>> being used, since a key hashes to one server. So if that one server goes
>> down (or a network split happens) then it is possible for two entities to
>> believe they own the same lock (and if the network split recovers this
>> gets even weirder); so that¹s what I was wondering about when mentioning
>> resiliency and how much incorrectness you are willing to tolerate.
>
> From my POV, the most important things are:
>
> * 2 nodes must never believe they hold the same lock
> * A node must eventually get the lock
>
> I was expecting to implement locking on all three backends as long as
> they support it. I haven't looked closely at memcached, but if it can
> detect a split it should be able to have a fencing race with the
> possible lock holder before continuing. This is obviously undesirable,
> as you will probably be fencing an otherwise correctly functioning node,
> but it will be correct.
There's a team working on a pluggable library for distributed
coordination: http://git.openstack.org/cgit/stackforge/tooz
Doug
>
> Matt
>
>>
>> -----Original Message-----
>> From: Matthew Booth <mbooth at redhat.com>
>> Organization: Red Hat
>> Date: Friday, June 13, 2014 at 1:40 AM
>> To: Joshua Harlow <harlowja at yahoo-inc.com>, "OpenStack Development Mailing
>> List (not for usage questions)" <openstack-dev at lists.openstack.org>
>> Subject: Re: [openstack-dev] [nova] Distributed locking
>>
>>> On 12/06/14 21:38, Joshua Harlow wrote:
>>>> So just a few thoughts before going to far down this path,
>>>>
>>>> Can we make sure we really really understand the use-case where we think
>>>> this is needed. I think it's fine that this use-case exists, but I just
>>>> want to make it very clear to others why its needed and why distributing
>>>> locking is the only *correct* way.
>>>
>>> An example use of this would be side-loading an image from another
>>> node's image cache rather than fetching it from glance, which would have
>>> very significant performance benefits in the VMware driver, and possibly
>>> other places. The copier must take a read lock on the image to prevent
>>> the owner from ageing it during the copy. Holding a read lock would also
>>> assure the copier that the image it is copying is complete.
>>>
>>>> This helps set a good precedent for others that may follow down this
>>>> path
>>>> that they also clearly explain the situation, how distributed locking
>>>> fixes it and all the corner cases that now pop-up with distributed
>>>> locking.
>>>>
>>>> Some of the questions that I can think of at the current moment:
>>>>
>>>> * What happens when a node goes down that owns the lock, how does the
>>>> software react to this?
>>>
>>> This can be well defined according to the behaviour of the backend. For
>>> example, it is well defined in zookeeper when a node's session expires.
>>> If the lock holder is no longer a valid node, it would be fenced before
>>> deleting its lock, allowing other nodes to continue.
>>>
>>> Without fencing it would not be possible to safely continue in this case.
>>>
>>>> * What resources are being locked; what is the lock target, what is its
>>>> lifetime?
>>>
>>> These are not questions for a locking implementation. A lock would be
>>> held on a name, and it would be up to the api user to ensure that the
>>> protected resource is only used while correctly locked, and that the
>>> lock is not held longer than necessary.
>>>
>>>> * What resiliency do you want this lock to provide (this becomes a
>>>> critical question when considering memcached, since memcached is not
>>>> really the best choice for a resilient distributing locking backend)?
>>>
>>> What does resiliency mean in this context? We really just need the lock
>>> to be correct
>>>
>>>> * What do entities that try to acquire a lock do when they can't acquire
>>>> it?
>>>
>>> Typically block, but if a use case emerged for trylock() it would be
>>> simple to implement. For example, in the image side-loading case we may
>>> decide that if it isn't possible to immediately acquire the lock it
>>> isn't worth waiting, and we just fetch it from glance anyway.
>>>
>>>> A useful thing I wrote up a while ago, might still be useful:
>>>>
>>>> https://wiki.openstack.org/wiki/StructuredWorkflowLocks
>>>>
>>>> Feel free to move that wiki if u find it useful (its sorta a high-level
>>>> doc on the different strategies and such).
>>>
>>> Nice list of implementation pros/cons.
>>>
>>> Matt
>>>
>>>>
>>>> -Josh
>>>>
>>>> -----Original Message-----
>>>> From: Matthew Booth <mbooth at redhat.com>
>>>> Organization: Red Hat
>>>> Reply-To: "OpenStack Development Mailing List (not for usage questions)"
>>>> <openstack-dev at lists.openstack.org>
>>>> Date: Thursday, June 12, 2014 at 7:30 AM
>>>> To: "OpenStack Development Mailing List (not for usage questions)"
>>>> <openstack-dev at lists.openstack.org>
>>>> Subject: [openstack-dev] [nova] Distributed locking
>>>>
>>>>> We have a need for a distributed lock in the VMware driver, which I
>>>>> suspect isn't unique. Specifically it is possible for a VMware
>>>>> datastore
>>>>> to be accessed via multiple nova nodes if it is shared between
>>>>> clusters[1]. Unfortunately the vSphere API doesn't provide us with the
>>>>> primitives to implement robust locking using the storage layer itself,
>>>>> so we're looking elsewhere.
>>>>>
>>>>> The closest we seem to have in Nova currently are service groups, which
>>>>> currently have 3 implementations: DB, Zookeeper and Memcached. The
>>>>> service group api currently provides simple membership, but for locking
>>>>> we'd be looking for something more.
>>>>>
>>>>> I think the api we'd be looking for would be something along the lines
>>>>> of:
>>>>>
>>>>> Foo.lock(name, fence_info)
>>>>> Foo.unlock(name)
>>>>>
>>>>> Bar.fence(fence_info)
>>>>>
>>>>> Note that fencing would be required in this case. We believe we can
>>>>> fence by terminating the other Nova's vSphere session, but other
>>>>> options
>>>>> might include killing a Nova process, or STONITH. These would be
>>>>> implemented as fencing drivers.
>>>>>
>>>>> Although I haven't worked through the detail, I believe lock and unlock
>>>>> would be implementable in all 3 of the current service group drivers.
>>>>> Fencing would be implemented separately.
>>>>>
>>>>> My questions:
>>>>>
>>>>> * Does this already exist, or does anybody have patches pending to do
>>>>> something like this?
>>>>> * Are there other users for this?
>>>>> * Would service groups be an appropriate place, or a new distributed
>>>>> locking class?
>>>>> * How about if we just used zookeeper directly in the driver?
>>>>>
>>>>> Matt
>>>>>
>>>>> [1] Cluster ~= hypervisor
>>>>> --
>>>>> Matthew Booth
>>>>> Red Hat Engineering, Virtualisation Team
>>>>>
>>>>> Phone: +442070094448 (UK)
>>>>> GPG ID: D33C3490
>>>>> GPG FPR: 3733 612D 2D05 5458 8A8A 1600 3441 EA19 D33C 3490
>>>>>
>>>>> _______________________________________________
>>>>> OpenStack-dev mailing list
>>>>> OpenStack-dev at lists.openstack.org
>>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>>
>>>
>>>
>>> --
>>> Matthew Booth
>>> Red Hat Engineering, Virtualisation Team
>>>
>>> Phone: +442070094448 (UK)
>>> GPG ID: D33C3490
>>> GPG FPR: 3733 612D 2D05 5458 8A8A 1600 3441 EA19 D33C 3490
>>
>
>
> --
> Matthew Booth
> Red Hat Engineering, Virtualisation Team
>
> Phone: +442070094448 (UK)
> GPG ID: D33C3490
> GPG FPR: 3733 612D 2D05 5458 8A8A 1600 3441 EA19 D33C 3490
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
More information about the OpenStack-dev
mailing list