[openstack-dev] [Heat] Locking and ZooKeeper - a space oddysey

Jay Pipes jaypipes at gmail.com
Wed Oct 30 19:27:32 UTC 2013


Has anyone looked into using concoord for distributed locking?

https://pypi.python.org/pypi/concoord

Best,
-jay

On 10/30/2013 02:39 PM, Joshua Harlow wrote:
> So my idea here was that to break the abstraction for heat into 3 parts.
>
> Pardon my lack of heat terminology/knowledge if I miss something.
>
> 1. The thing that receives the API request (I would assume an api server
> here).
>
> I would expect #1 to parse something into a known internal format. Whether
> this is tasks or jobs or something is up to heat, so this might of been my
> lack of understanding heat concepts here, but usually an API request
> translates into some internal format. Maybe this is the parser or
> something else (not sure really).
>
> Lets assume for now that it parses the API request into some tasks + flow
> (what taskflow provides).
>
> So then it becomes a question of how what do u do with those tasks & flows
> (what I call stage #2).
>
> - https://wiki.openstack.org/wiki/TaskFlow#Two_thousand_foot_view
>
> To me this is where taskflow 'shines' in that it has an engine concept
> which can run in various manners (the tasks and flow are not strongly
> associated with a engine). One of these engines is planned to be a
> distributed one (but its not the only one) and with that engine type it
> would have to interact with some type of job management system (or it
> would have to provide that job management system - or a simple version
> itself), but the difference is that the about tasks and flows (and the
> links/structure between them) is still disconnected from the actual engine
> that runs those tasks & flows. So this to mean means that there is
> plugabbility with regard to execution, which I think is pretty great.
>
> If that requires rework of the heat model, way of running, maybe its for
> the better? Idk.
>
> As taskflow is still newish, and most projects in openstack have there own
> distributed model (conductors, rpc process separation), we wanted to focus
> on having the basic principles down, and the review
> https://review.openstack.org/#/c/47609/ I am very grateful for jessica
> working her hardest to get that in a nearly there state. So yes, taskflow
> will continue on the path/spirit of 47609, and contributions are welcome
> of course :-)
>
> Feel free to also jump on #openstack-state-management since it might be
> easier to just chat there in the end with other interested parties.
>
> -Josh
>
> On 10/30/13 11:10 AM, "Steven Dake" <sdake at redhat.com> wrote:
>
>> On 10/30/2013 10:42 AM, Clint Byrum wrote:
>>> So, recently we've had quite a long thread in gerrit regarding locking
>>> in Heat:
>>>
>>> https://review.openstack.org/#/c/49440/
>>>
>>> In the patch, there are two distributed lock drivers. One uses SQL,
>>> and suffers from all the problems you might imagine a SQL based locking
>>> system would. It is extremely hard to detect dead lock holders, so we
>>> end up with really long timeouts. The other is ZooKeeper.
>>>
>>> I'm on record as saying we're not using ZooKeeper. It is a little
>>> embarrassing to have taken such a position without really thinking
>>> things
>>> through. The main reason I feel this way though, is not because
>>> ZooKeeper
>>> wouldn't work for locking, but because I think locking is a mistake.
>>>
>>> The current multi-engine paradigm has a race condition. If you have a
>>> stack action going on, the state is held in the engine itself, and not
>>> in the database, so if another engine starts working on another action,
>>> they will conflict.
>>>
>>> The locking paradigm is meant to prevent this. But I think this is a
>>> huge mistake.
>>>
>>> The engine should store _all_ of its state in a distributed data store
>>> of some kind. Any engine should be aware of what is already happening
>>> with the stack from this state and act accordingly. That includes the
>>> engine currently working on actions. When viewed through this lense,
>>> to me, locking is a poor excuse for serializing the state of the engine
>>> scheduler.
>>>
>>> It feels like TaskFlow is the answer, with an eye for making sure
>>> TaskFlow can be made to work with distributed state. I am not well
>>> versed on TaskFlow's details though, so I may be wrong. It worries me
>>> that TaskFlow has existed a while and doesn't seem to be solving real
>>> problems, but maybe I'm wrong and it is actually in use already.
>>>
>>> Anyway, as a band-aid, we may _have_ to do locking. For that, ZooKeeper
>>> has some real advantages over using the database. But there is hesitance
>>> because it is not widely supported in OpenStack. What say you, OpenStack
>>> community? Should we keep ZooKeeper out of our.. zoo?
>>
>> I will -2 any patch that adds zookeeper as a dependency to Heat.
>>
>> The rest of the idea sounds good though.  I spoke with Joshua about
>> TaskFlow Friday as a possibility for solving this problem, but TaskFlow
>> presently does not implement a distributed task flow. Joshua indicated
>> there was a celerity review at https://review.openstack.org/#/c/47609/,
>> but this would introduce a different server dependency which suffers
>>from the same issues as Zookeeper, not to mention incomplete AMQP server
>> support for various AMQP implementations.  Joshua indicated using a pure
>> AMQP implementation would be possible for this job but is not implemented.
>>
>> I did get into a discussion with him about the subject of breaking the
>> tasks in the flow into "jobs", which led to the suggestion that the
>> parser should be part of the API server process (then the engine could
>> be responsible for handling the various jobs Heat needs). Sounds like
>> poor abstraction, not to mention serious rework required.
>>
>> My take from our IRC discussion was that TaskFlow is not a job
>> distribution system (like Gearman) but an in-process workflow manager.
>> These two things are different.  I was unclear if Taskflow could be made
>> to do both, while also operating under already supported AMQP server
>> infrastructure that all of OpenStack relies on currently.  If it could,
>> that would be fantastic, as we would only have to introduce a library
>> dependency vs a full on server dependency with documentation, HA and
>> scalability concerns.
>>
>> Regards
>> -steve
>>
>>> _______________________________________________
>>> OpenStack-dev mailing list
>>> OpenStack-dev at lists.openstack.org
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>>
>> _______________________________________________
>> OpenStack-dev mailing list
>> OpenStack-dev at lists.openstack.org
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>




More information about the OpenStack-dev mailing list