[openstack-dev] [nova] Distributed locking
Angus Lees
guslees at gmail.com
Fri Jun 13 04:27:13 UTC 2014
On Thu, 12 Jun 2014 05:06:38 PM Julien Danjou wrote:
> On Thu, Jun 12 2014, Matthew Booth wrote:
> > This looks interesting. It doesn't have hooks for fencing, though.
> >
> > What's the status of tooz? Would you be interested in adding fencing
> > hooks?
>
> It's maintained and developer, we have plan to use it in Ceilometer and
> others projects. Joshua also wants to use it for Taskflow.
>
> We are blocked for now by https://review.openstack.org/#/c/93443/ and by
> the lack of resource to complete that request obviously, so help
> appreciated. :)
>
> As for fencing hooks, it sounds like a good idea.
As far as I understand these things, in distributed-locking-speak "fencing"
just means "breaking someone else's lock".
I think your options here are (and apologies if I'm repeating things that are
obvious):
1. Have a "force unlock" protocol (numerous alternatives exist). Assume the
lock holder implements it properly and stops accessing the shared resource
when asked.
2. Kill the lock holder using some method unrelated to the locking service and
wait for the locking protocol to realise ex-holder is dead through usual
liveness tests. Assume not being able to hold the lock implies no longer able
to access the shared resource.
The "liveness test" usually involves the holder pinging the lock service
periodically, and everyone has to wait for some agreed timeout before assuming
a client is dead.
(1) involves a lot of trust - and seems particularly bad if the reason you are
breaking the lock is because the holder is misbehaving.
Assuming (2) is the only reasonable choice, I don't think the lock service
needs explicit support for fencing, since the exact method for killing the
holder is unrelated, and relatively uninteresting (probably always going to be
an instance delete in OS).
Perhaps more interesting is exactly what conditions you require before
attempting to kill the lock holder - you wouldn't want just any job deciding
it was warranted, or else a misbehaving client would cause mayhem. Again, I
suggest your options here are:
1. Require human judgement.
ie: provide monitoring for whatever is misbehaving and make it obvious that
one mitigation is to nuke the apparent holder.
2. Require the lock breaker to be able to reach a majority of nodes as some
proof of "I'm working, my opinion must be right".
In a paxos system, reaching a majority of nodes basically becomes "hold a
lock", we end back up with "my liveness test is better than yours somehow",
and I'm not sure how to resolve that without human judgement (but I'm not
familiar with existing approaches). Again, I don't think this needs
additional support from the lock service, beyond a liveness test (which
zookeeper, for example, has).
tl;dr: I'm interested in what sort of automated fencing behaviour you'd like.
--
- Gus
More information about the OpenStack-dev
mailing list