[openstack-dev] [Heat] Locking and ZooKeeper - a space oddysey

Joshua Harlow harlowja at yahoo-inc.com
Wed Oct 30 18:39:06 UTC 2013


So my idea here was that to break the abstraction for heat into 3 parts.

Pardon my lack of heat terminology/knowledge if I miss something.

1. The thing that receives the API request (I would assume an api server
here).

I would expect #1 to parse something into a known internal format. Whether
this is tasks or jobs or something is up to heat, so this might of been my
lack of understanding heat concepts here, but usually an API request
translates into some internal format. Maybe this is the parser or
something else (not sure really).

Lets assume for now that it parses the API request into some tasks + flow
(what taskflow provides).

So then it becomes a question of how what do u do with those tasks & flows
(what I call stage #2).

- https://wiki.openstack.org/wiki/TaskFlow#Two_thousand_foot_view

To me this is where taskflow 'shines' in that it has an engine concept
which can run in various manners (the tasks and flow are not strongly
associated with a engine). One of these engines is planned to be a
distributed one (but its not the only one) and with that engine type it
would have to interact with some type of job management system (or it
would have to provide that job management system - or a simple version
itself), but the difference is that the about tasks and flows (and the
links/structure between them) is still disconnected from the actual engine
that runs those tasks & flows. So this to mean means that there is
plugabbility with regard to execution, which I think is pretty great.

If that requires rework of the heat model, way of running, maybe its for
the better? Idk.

As taskflow is still newish, and most projects in openstack have there own
distributed model (conductors, rpc process separation), we wanted to focus
on having the basic principles down, and the review
https://review.openstack.org/#/c/47609/ I am very grateful for jessica
working her hardest to get that in a nearly there state. So yes, taskflow
will continue on the path/spirit of 47609, and contributions are welcome
of course :-)

Feel free to also jump on #openstack-state-management since it might be
easier to just chat there in the end with other interested parties.

-Josh

On 10/30/13 11:10 AM, "Steven Dake" <sdake at redhat.com> wrote:

>On 10/30/2013 10:42 AM, Clint Byrum wrote:
>> So, recently we've had quite a long thread in gerrit regarding locking
>> in Heat:
>>
>> https://review.openstack.org/#/c/49440/
>>
>> In the patch, there are two distributed lock drivers. One uses SQL,
>> and suffers from all the problems you might imagine a SQL based locking
>> system would. It is extremely hard to detect dead lock holders, so we
>> end up with really long timeouts. The other is ZooKeeper.
>>
>> I'm on record as saying we're not using ZooKeeper. It is a little
>> embarrassing to have taken such a position without really thinking
>>things
>> through. The main reason I feel this way though, is not because
>>ZooKeeper
>> wouldn't work for locking, but because I think locking is a mistake.
>>
>> The current multi-engine paradigm has a race condition. If you have a
>> stack action going on, the state is held in the engine itself, and not
>> in the database, so if another engine starts working on another action,
>> they will conflict.
>>
>> The locking paradigm is meant to prevent this. But I think this is a
>> huge mistake.
>>
>> The engine should store _all_ of its state in a distributed data store
>> of some kind. Any engine should be aware of what is already happening
>> with the stack from this state and act accordingly. That includes the
>> engine currently working on actions. When viewed through this lense,
>> to me, locking is a poor excuse for serializing the state of the engine
>> scheduler.
>>
>> It feels like TaskFlow is the answer, with an eye for making sure
>> TaskFlow can be made to work with distributed state. I am not well
>> versed on TaskFlow's details though, so I may be wrong. It worries me
>> that TaskFlow has existed a while and doesn't seem to be solving real
>> problems, but maybe I'm wrong and it is actually in use already.
>>
>> Anyway, as a band-aid, we may _have_ to do locking. For that, ZooKeeper
>> has some real advantages over using the database. But there is hesitance
>> because it is not widely supported in OpenStack. What say you, OpenStack
>> community? Should we keep ZooKeeper out of our.. zoo?
>
>I will -2 any patch that adds zookeeper as a dependency to Heat.
>
>The rest of the idea sounds good though.  I spoke with Joshua about
>TaskFlow Friday as a possibility for solving this problem, but TaskFlow
>presently does not implement a distributed task flow. Joshua indicated
>there was a celerity review at https://review.openstack.org/#/c/47609/,
>but this would introduce a different server dependency which suffers
>from the same issues as Zookeeper, not to mention incomplete AMQP server
>support for various AMQP implementations.  Joshua indicated using a pure
>AMQP implementation would be possible for this job but is not implemented.
>
>I did get into a discussion with him about the subject of breaking the
>tasks in the flow into "jobs", which led to the suggestion that the
>parser should be part of the API server process (then the engine could
>be responsible for handling the various jobs Heat needs). Sounds like
>poor abstraction, not to mention serious rework required.
>
>My take from our IRC discussion was that TaskFlow is not a job
>distribution system (like Gearman) but an in-process workflow manager.
>These two things are different.  I was unclear if Taskflow could be made
>to do both, while also operating under already supported AMQP server
>infrastructure that all of OpenStack relies on currently.  If it could,
>that would be fantastic, as we would only have to introduce a library
>dependency vs a full on server dependency with documentation, HA and
>scalability concerns.
>
>Regards
>-steve
>
>> _______________________________________________
>> OpenStack-dev mailing list
>> OpenStack-dev at lists.openstack.org
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
>_______________________________________________
>OpenStack-dev mailing list
>OpenStack-dev at lists.openstack.org
>http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




More information about the OpenStack-dev mailing list