[openstack-dev] [Heat] Locking and ZooKeeper - a space oddysey

Robert Collins robertc at robertcollins.net
Wed Oct 30 18:25:03 UTC 2013


On 31 October 2013 06:42, Clint Byrum <clint at fewbar.com> wrote:
> So, recently we've had quite a long thread in gerrit regarding locking
> in Heat:
>
> https://review.openstack.org/#/c/49440/
>
> In the patch, there are two distributed lock drivers. One uses SQL,
> and suffers from all the problems you might imagine a SQL based locking
> system would. It is extremely hard to detect dead lock holders, so we
> end up with really long timeouts. The other is ZooKeeper.
>
> I'm on record as saying we're not using ZooKeeper. It is a little
> embarrassing to have taken such a position without really thinking things
> through. The main reason I feel this way though, is not because ZooKeeper
> wouldn't work for locking, but because I think locking is a mistake.

I agree with all your points:
 - that mutex style locking here is a mistake
 - that we need a workaround in the short term
 - that sql locking can be hard to get right

However if this is a short term workaround, who cares if SQL locking
has bad failure modes: it's short term and the failure we're replacing
(engines tramping on each other) is also bad.

On Zookeeper: this would be the first Java service /required/ as part
of a deployment of OpenStack's integrated components. I think that
requires broad consensus - possibly even a TC vote - before adding it.
[NB: I'm not against Java, but it's not a social norm here]. Secondly,
but also importantly, I seem to recall Zookeeper really not being
suitable for secure environments, but maybe thats just how it was used
in my previous interactions with it?

-Rob

-- 
Robert Collins <rbtcollins at hp.com>
Distinguished Technologist
HP Converged Cloud



More information about the OpenStack-dev mailing list