[openstack-dev] [Heat] Locking and ZooKeeper - a space oddysey

Angus Salkeld asalkeld at redhat.com
Wed Oct 30 23:30:40 UTC 2013


On 30/10/13 15:27 -0400, Jay Pipes wrote:
>Has anyone looked into using concoord for distributed locking?
>
>https://pypi.python.org/pypi/concoord

Looks interesting!

-Angus

>
>Best,
>-jay
>
>On 10/30/2013 02:39 PM, Joshua Harlow wrote:
>>So my idea here was that to break the abstraction for heat into 3 parts.
>>
>>Pardon my lack of heat terminology/knowledge if I miss something.
>>
>>1. The thing that receives the API request (I would assume an api server
>>here).
>>
>>I would expect #1 to parse something into a known internal format. Whether
>>this is tasks or jobs or something is up to heat, so this might of been my
>>lack of understanding heat concepts here, but usually an API request
>>translates into some internal format. Maybe this is the parser or
>>something else (not sure really).
>>
>>Lets assume for now that it parses the API request into some tasks + flow
>>(what taskflow provides).
>>
>>So then it becomes a question of how what do u do with those tasks & flows
>>(what I call stage #2).
>>
>>- https://wiki.openstack.org/wiki/TaskFlow#Two_thousand_foot_view
>>
>>To me this is where taskflow 'shines' in that it has an engine concept
>>which can run in various manners (the tasks and flow are not strongly
>>associated with a engine). One of these engines is planned to be a
>>distributed one (but its not the only one) and with that engine type it
>>would have to interact with some type of job management system (or it
>>would have to provide that job management system - or a simple version
>>itself), but the difference is that the about tasks and flows (and the
>>links/structure between them) is still disconnected from the actual engine
>>that runs those tasks & flows. So this to mean means that there is
>>plugabbility with regard to execution, which I think is pretty great.
>>
>>If that requires rework of the heat model, way of running, maybe its for
>>the better? Idk.
>>
>>As taskflow is still newish, and most projects in openstack have there own
>>distributed model (conductors, rpc process separation), we wanted to focus
>>on having the basic principles down, and the review
>>https://review.openstack.org/#/c/47609/ I am very grateful for jessica
>>working her hardest to get that in a nearly there state. So yes, taskflow
>>will continue on the path/spirit of 47609, and contributions are welcome
>>of course :-)
>>
>>Feel free to also jump on #openstack-state-management since it might be
>>easier to just chat there in the end with other interested parties.
>>
>>-Josh
>>
>>On 10/30/13 11:10 AM, "Steven Dake" <sdake at redhat.com> wrote:
>>
>>>On 10/30/2013 10:42 AM, Clint Byrum wrote:
>>>>So, recently we've had quite a long thread in gerrit regarding locking
>>>>in Heat:
>>>>
>>>>https://review.openstack.org/#/c/49440/
>>>>
>>>>In the patch, there are two distributed lock drivers. One uses SQL,
>>>>and suffers from all the problems you might imagine a SQL based locking
>>>>system would. It is extremely hard to detect dead lock holders, so we
>>>>end up with really long timeouts. The other is ZooKeeper.
>>>>
>>>>I'm on record as saying we're not using ZooKeeper. It is a little
>>>>embarrassing to have taken such a position without really thinking
>>>>things
>>>>through. The main reason I feel this way though, is not because
>>>>ZooKeeper
>>>>wouldn't work for locking, but because I think locking is a mistake.
>>>>
>>>>The current multi-engine paradigm has a race condition. If you have a
>>>>stack action going on, the state is held in the engine itself, and not
>>>>in the database, so if another engine starts working on another action,
>>>>they will conflict.
>>>>
>>>>The locking paradigm is meant to prevent this. But I think this is a
>>>>huge mistake.
>>>>
>>>>The engine should store _all_ of its state in a distributed data store
>>>>of some kind. Any engine should be aware of what is already happening
>>>>with the stack from this state and act accordingly. That includes the
>>>>engine currently working on actions. When viewed through this lense,
>>>>to me, locking is a poor excuse for serializing the state of the engine
>>>>scheduler.
>>>>
>>>>It feels like TaskFlow is the answer, with an eye for making sure
>>>>TaskFlow can be made to work with distributed state. I am not well
>>>>versed on TaskFlow's details though, so I may be wrong. It worries me
>>>>that TaskFlow has existed a while and doesn't seem to be solving real
>>>>problems, but maybe I'm wrong and it is actually in use already.
>>>>
>>>>Anyway, as a band-aid, we may _have_ to do locking. For that, ZooKeeper
>>>>has some real advantages over using the database. But there is hesitance
>>>>because it is not widely supported in OpenStack. What say you, OpenStack
>>>>community? Should we keep ZooKeeper out of our.. zoo?
>>>
>>>I will -2 any patch that adds zookeeper as a dependency to Heat.
>>>
>>>The rest of the idea sounds good though.  I spoke with Joshua about
>>>TaskFlow Friday as a possibility for solving this problem, but TaskFlow
>>>presently does not implement a distributed task flow. Joshua indicated
>>>there was a celerity review at https://review.openstack.org/#/c/47609/,
>>>but this would introduce a different server dependency which suffers
>>>from the same issues as Zookeeper, not to mention incomplete AMQP server
>>>support for various AMQP implementations.  Joshua indicated using a pure
>>>AMQP implementation would be possible for this job but is not implemented.
>>>
>>>I did get into a discussion with him about the subject of breaking the
>>>tasks in the flow into "jobs", which led to the suggestion that the
>>>parser should be part of the API server process (then the engine could
>>>be responsible for handling the various jobs Heat needs). Sounds like
>>>poor abstraction, not to mention serious rework required.
>>>
>>>My take from our IRC discussion was that TaskFlow is not a job
>>>distribution system (like Gearman) but an in-process workflow manager.
>>>These two things are different.  I was unclear if Taskflow could be made
>>>to do both, while also operating under already supported AMQP server
>>>infrastructure that all of OpenStack relies on currently.  If it could,
>>>that would be fantastic, as we would only have to introduce a library
>>>dependency vs a full on server dependency with documentation, HA and
>>>scalability concerns.
>>>
>>>Regards
>>>-steve
>>>
>>>>_______________________________________________
>>>>OpenStack-dev mailing list
>>>>OpenStack-dev at lists.openstack.org
>>>>http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>
>>>
>>>_______________________________________________
>>>OpenStack-dev mailing list
>>>OpenStack-dev at lists.openstack.org
>>>http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>>
>>_______________________________________________
>>OpenStack-dev mailing list
>>OpenStack-dev at lists.openstack.org
>>http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>
>
>_______________________________________________
>OpenStack-dev mailing list
>OpenStack-dev at lists.openstack.org
>http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



More information about the OpenStack-dev mailing list