[openstack-dev] [Heat] Locking and ZooKeeper - a space oddysey

Joshua Harlow harlowja at yahoo-inc.com
Wed Oct 30 18:57:30 UTC 2013


As for the mutex and locking and all that problem.

I would expect locking to be a necessity at some point for openstack.

Even if the state transitions are the locks themselves (that¹s still a
lock by another name imho) and u need a reliable way to store and change
those state transitions (aka what the last one was, what the next one is).
A database can be likely ok here but is not ideal as complexity increases.
The other part that I think zookeepr addresses is similar to how it is
used in nova, where its used as a 'liveness' system, where instead of
consistently updating a database with heartbeats zookeeper itself
maintains that information without requiring constant updates to a DB
(which doesn't scale).

There are fairly good reason why zookeeper and chubby (and similar
systems) exist :)

- http://research.google.com/archive/chubby.html
- 
http://labs.yahoo.com/publication/zookeeper-wait-free-coordination-for-inte
rnet-scale-systems/
- http://devo.ps/blog/2013/09/11/zookeeper-vs-doozer-vs-etcd.html


Of course the usage of such systems must be carefully discussed and
thought out, but that¹s nothing new to everyone here.

On 10/30/13 11:25 AM, "Robert Collins" <robertc at robertcollins.net> wrote:

>On 31 October 2013 06:42, Clint Byrum <clint at fewbar.com> wrote:
>> So, recently we've had quite a long thread in gerrit regarding locking
>> in Heat:
>>
>> https://review.openstack.org/#/c/49440/
>>
>> In the patch, there are two distributed lock drivers. One uses SQL,
>> and suffers from all the problems you might imagine a SQL based locking
>> system would. It is extremely hard to detect dead lock holders, so we
>> end up with really long timeouts. The other is ZooKeeper.
>>
>> I'm on record as saying we're not using ZooKeeper. It is a little
>> embarrassing to have taken such a position without really thinking
>>things
>> through. The main reason I feel this way though, is not because
>>ZooKeeper
>> wouldn't work for locking, but because I think locking is a mistake.
>
>I agree with all your points:
> - that mutex style locking here is a mistake
> - that we need a workaround in the short term
> - that sql locking can be hard to get right
>
>However if this is a short term workaround, who cares if SQL locking
>has bad failure modes: it's short term and the failure we're replacing
>(engines tramping on each other) is also bad.
>
>On Zookeeper: this would be the first Java service /required/ as part
>of a deployment of OpenStack's integrated components. I think that
>requires broad consensus - possibly even a TC vote - before adding it.
>[NB: I'm not against Java, but it's not a social norm here]. Secondly,
>but also importantly, I seem to recall Zookeeper really not being
>suitable for secure environments, but maybe thats just how it was used
>in my previous interactions with it?
>
>-Rob
>
>-- 
>Robert Collins <rbtcollins at hp.com>
>Distinguished Technologist
>HP Converged Cloud
>
>_______________________________________________
>OpenStack-dev mailing list
>OpenStack-dev at lists.openstack.org
>http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




More information about the OpenStack-dev mailing list