[openstack-dev] [nova] Distributed locking

Angus Lees guslees at gmail.com
Fri Jun 13 04:27:13 UTC 2014


On Thu, 12 Jun 2014 05:06:38 PM Julien Danjou wrote:
> On Thu, Jun 12 2014, Matthew Booth wrote:
> > This looks interesting. It doesn't have hooks for fencing, though.
> > 
> > What's the status of tooz? Would you be interested in adding fencing
> > hooks?
> 
> It's maintained and developer, we have plan to use it in Ceilometer and
> others projects. Joshua also wants to use it for Taskflow.
> 
> We are blocked for now by https://review.openstack.org/#/c/93443/ and by
> the lack of resource to complete that request obviously, so help
> appreciated. :)
> 
> As for fencing hooks, it sounds like a good idea.

As far as I understand these things, in distributed-locking-speak "fencing" 
just means "breaking someone else's lock".

I think your options here are (and apologies if I'm repeating things that are 
obvious):

1. Have a "force unlock" protocol (numerous alternatives exist).  Assume the 
lock holder implements it properly and stops accessing the shared resource 
when asked.

2. Kill the lock holder using some method unrelated to the locking service and 
wait for the locking protocol to realise ex-holder is dead through usual 
liveness tests.  Assume not being able to hold the lock implies no longer able 
to access the shared resource.
The "liveness test" usually involves the holder pinging the lock service 
periodically, and everyone has to wait for some agreed timeout before assuming 
a client is dead.

(1) involves a lot of trust - and seems particularly bad if the reason you are 
breaking the lock is because the holder is misbehaving.
Assuming (2) is the only reasonable choice, I don't think the lock service 
needs explicit support for fencing, since the exact method for killing the 
holder is unrelated, and relatively uninteresting (probably always going to be 
an instance delete in OS).


Perhaps more interesting is exactly what conditions you require before 
attempting to kill the lock holder - you wouldn't want just any job deciding 
it was warranted, or else a misbehaving client would cause mayhem.  Again, I 
suggest your options here are:

1. Require human judgement.
ie: provide monitoring for whatever is misbehaving and make it obvious that 
one mitigation is to nuke the apparent holder.

2. Require the lock breaker to be able to reach a majority of nodes as some 
proof of "I'm working, my opinion must be right".
In a paxos system, reaching a majority of nodes basically becomes "hold a 
lock", we end back up with "my liveness test is better than yours somehow", 
and I'm not sure how to resolve that without human judgement (but I'm not 
familiar with existing approaches).  Again, I don't think this needs 
additional support from the lock service, beyond a liveness test (which 
zookeeper, for example, has).

tl;dr: I'm interested in what sort of automated fencing behaviour you'd like.

-- 
 - Gus



More information about the OpenStack-dev mailing list