[openstack-dev] [Heat] Locking and ZooKeeper - a space oddysey

Joshua Harlow harlowja at yahoo-inc.com
Thu Oct 31 01:13:19 UTC 2013


Yup, galera, thx! :)

As for the:

"It also doesn't handle the case where u can automatically recover from the
current resource owner (nova-compute for example) dying."

So heat is actively working on some resources, doing its thing, its binary
crashes (or kill -9 occurs), what happens? The same question u can ask for
nova-compute. Hope that makes more sense now. To me u need a system that
can detect liveness of processes and can automatically handle the case
where it dies (maybe by starting up another heat, or nova-compute or ...).

But ya, your summary is right, distributed systems are wonky just in
general. But all I can say is that zookeeper is pretty battle tested :)

On 10/30/13 6:04 PM, "Clint Byrum" <clint at fewbar.com> wrote:

>Excerpts from Joshua Harlow's message of 2013-10-30 17:46:44 -0700:
>> This works as long as you have 1 DB and don't fail over to a secondary
>> slave DB.
>> 
>> Now u can say we all must use percona (or similar) for this, but then
>
>Did you mean Galera which provides multiple synchronous masters?
>
>> that¹s a change in deployment as well (and imho a bigger one). This is
>> where the concept of a quorum in zookeeper comes into play, the
>> transaction log that zookeeper maintains will be consistent among all
>> members in that quorum. This is a typical zookeeper deployment strategy
>> (select how many nodes u want for that quorum being an important
>>question).
>>
>
>Galera uses more or less the exact same mechanism.
>
>> It also doesn't handle the case where u can automatically recover from
>>the
>> current resource owner (nova-compute for example) dying.
>>
>
>I don't know what that means.
>
>> Your atomic "check-if-owner-is-empty-and-store-instance-as-owner" is now
>> user initiated instead of being automatic (zookeeper provides these kind
>> of notifications via its watch concept). So that makes it hard for say
>>an
>> automated system (heat?) to react to these failures in any other way
>>than
>> repeated polling (or repeated retries or periodic tasks) which means
>>that
>> heat will not be able to react to failure in a 'live' manner. So this to
>> me is the liveness question that zookeeper is designed to help out with,
>> of course u can simulate this in a DB and repeated polling (as long as u
>> don't try to do anything complicated with mysql, like replicas/slaves
>>with
>> transaction logs that may not be caught up and that u might have to fail
>> over to if problems happen, since u are on your own if this happens).
>>
>
>Right, even if you have a Galera cluster you still have to poll it or
>use wonky things like triggers hooked up to memcache/gearman/amqp UDF's
>to get around polling latency.
>
>I think your point is that a weird MySQL is just as disruptive to "the
>normal OpenStack deployment" as a weird service like ZooKeeper.
>
>_______________________________________________
>OpenStack-dev mailing list
>OpenStack-dev at lists.openstack.org
>http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



More information about the OpenStack-dev mailing list