[openstack-dev] [Cinder] A possible solution for HA Active-Active

Andrew Beekhof abeekhof at redhat.com
Fri Aug 7 05:55:14 UTC 2015


> On 5 Aug 2015, at 1:34 am, Joshua Harlow <harlowja at outlook.com> wrote:
> 
> Philipp Marek wrote:
>>> If we end up using a DLM then we have to detect when the connection to
>>> the DLM is lost on a node and stop all ongoing operations to prevent
>>> data corruption.
>>> 
>>> It may not be trivial to do, but we will have to do it in any solution
>>> we use, even on my last proposal that only uses the DB in Volume Manager
>>> we would still need to stop all operations if we lose connection to the
>>> DB.
>> 
>> Well, is it already decided that Pacemaker would be chosen to provide HA in
>> Openstack? There's been a talk "Pacemaker: the PID 1 of Openstack" IIRC.
>> 
>> I know that Pacemaker's been pushed aside in an earlier ML post, but IMO
>> there's already *so much* been done for HA in Pacemaker that Openstack
>> should just use it.
>> 
>> All HA nodes needs to participate in a Pacemaker cluster - and if one node
>> looses connection, all services will get stopped automatically (by
>> Pacemaker) - or the node gets fenced.
>> 
>> 
>> No need to invent some sloppy scripts to do exactly the tasks (badly!) that
>> the Linux HA Stack has been providing for quite a few years.
>> 
>> 
>> Yes, Pacemaker needs learning - but not more than any other involved
>> project, and there are already quite a few here, which have to be known to
>> any operator or developer already.
>> 
>> 
>> (BTW, LINBIT sells training for the Linux HA Cluster Stack - and yes,
>>  I work for them ;)
> 
> So just a piece of information, but yahoo (the company I work for, with vms in the tens of thousands, baremetal in the much more than that...) hasn't used pacemaker, and in all honesty this is the first project (openstack) that I have heard that needs such a solution. I feel that we really should be building our services better so that they can be A-A vs having to depend on another piece of software to get around our 'sloppiness' (for lack of a better word).

HA is a deceptively hard problem.
There is really no need for every project to attempt to solve it on their own.
Having everyone consuming/calculating a different membership list is a very good way to go insane.

Aside from the usual bugs, the HA space lends itself to making simplifying assumptions early on, only to trap you with them down the road.
Its even worse if you’re trying to bolt it on after-the-fact...

Perhaps try to think of pacemaker as a distribute finite state machine instead of a cluster manager.
That is part of the value we bring to projects like galera and rabbitmq.

Sure they are A-A, and once they’re up they can survive many failures, but bringing them up can be non-trivial.
We also provide the additional context (eg. quorum and fencing) that allow more kinds of failures to be safely recovered from.

Something to think about perhaps.

— Andrew

> 
> Nothing against pacemaker personally... IMHO it just doesn't feel like we are doing this right if we need such a product in the first place.
> 
>> 
>> __________________________________________________________________________
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




More information about the OpenStack-dev mailing list