[openstack-dev] [Heat] Locking and ZooKeeper - a space oddysey
Monty Taylor
mordred at inaugust.com
Thu Oct 31 17:17:50 UTC 2013
Sigh.
Yay!!!! We've added more competing methods of complexity!!!
Seriously. We now think that rabbit and zookeeper and mysql are ALL needed?
Joshua Harlow <harlowja at yahoo-inc.com> wrote:
>I'm pretty sure the cats out of the bag.
>
>https://github.com/openstack/requirements/blob/master/global-requirements.t
>xt#L29
>
>https://kazoo.readthedocs.org/en/latest/
>
>-Josh
>
>On 10/31/13 7:43 AM, "Monty Taylor" <mordred at inaugust.com> wrote:
>
>>
>>
>>On 10/30/2013 10:42 AM, Clint Byrum wrote:
>>> So, recently we've had quite a long thread in gerrit regarding locking
>>> in Heat:
>>>
>>> https://review.openstack.org/#/c/49440/
>>>
>>> In the patch, there are two distributed lock drivers. One uses SQL,
>>> and suffers from all the problems you might imagine a SQL based locking
>>> system would. It is extremely hard to detect dead lock holders, so we
>>> end up with really long timeouts. The other is ZooKeeper.
>>>
>>> I'm on record as saying we're not using ZooKeeper. It is a little
>>> embarrassing to have taken such a position without really thinking
>>>things
>>> through. The main reason I feel this way though, is not because
>>>ZooKeeper
>>> wouldn't work for locking, but because I think locking is a mistake.
>>>
>>> The current multi-engine paradigm has a race condition. If you have a
>>> stack action going on, the state is held in the engine itself, and not
>>> in the database, so if another engine starts working on another action,
>>> they will conflict.
>>>
>>> The locking paradigm is meant to prevent this. But I think this is a
>>> huge mistake.
>>>
>>> The engine should store _all_ of its state in a distributed data store
>>> of some kind. Any engine should be aware of what is already happening
>>> with the stack from this state and act accordingly. That includes the
>>> engine currently working on actions. When viewed through this lense,
>>> to me, locking is a poor excuse for serializing the state of the engine
>>> scheduler.
>>>
>>> It feels like TaskFlow is the answer, with an eye for making sure
>>> TaskFlow can be made to work with distributed state. I am not well
>>> versed on TaskFlow's details though, so I may be wrong. It worries me
>>> that TaskFlow has existed a while and doesn't seem to be solving real
>>> problems, but maybe I'm wrong and it is actually in use already.
>>>
>>> Anyway, as a band-aid, we may _have_ to do locking. For that, ZooKeeper
>>> has some real advantages over using the database. But there is hesitance
>>> because it is not widely supported in OpenStack. What say you, OpenStack
>>> community? Should we keep ZooKeeper out of our.. zoo?
>>
>>Yes. I'm strongly opposed to ZooKeeper finding its way into the already
>>complex pile of things we use.
>>
>>_______________________________________________
>>OpenStack-dev mailing list
>>OpenStack-dev at lists.openstack.org
>>http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
More information about the OpenStack-dev
mailing list