[openstack-dev] [Cinder] A possible solution for HA Active-Active

Clint Byrum clint at fewbar.com
Fri Jul 31 22:18:39 UTC 2015


Excerpts from Mike Perez's message of 2015-07-31 10:40:04 -0700:
> On Fri, Jul 31, 2015 at 8:56 AM, Joshua Harlow <harlowja at outlook.com> wrote:
> > ...random thought here, skip as needed... in all honesty orchestration
> > solutions like mesos
> > (http://mesos.apache.org/assets/img/documentation/architecture3.jpg),
> > map-reduce solutions like hadoop, stream processing systems like apache
> > storm (...), are already using zookeeper and I'm not saying we should just
> > use it cause they are, but the likelihood that they just picked it for no
> > reason are imho slim.
> 
> I'd really like to see focus cross project. I don't want Ceilometer to
> depend on Zoo Keeper, Cinder to depend on etcd, etc. This is not ideal
> for an operator to have to deploy, learn and maintain each of these
> solutions.
> 
> I think this is difficult when you consider everyone wants options of
> their preferred DLM. If we went this route, we should pick one.
> 
> Regardless, I want to know if we really need a DLM. Does Ceilometer
> really need a DLM? Does Cinder really need a DLM? Can we just use a
> hash ring solution where operators don't even have to know or care
> about deploying a DLM and running multiple instances of Cinder manager
> just works?
> 

So in the Ironic case, if two conductors decide they both own one IPMI
controller, _chaos_ can ensue. They may, at different times, read that
the power is up, or down, and issue power control commands that may take
many seconds, and thus on the next status run of the other command may
cause the conductor to react by reversing, and they'll just fight over
the node in a tug-o-war fashion.

Oh wait, except, thats not true. Instead, they use the database as a
locking mechanism, and AFAIK, no nodes have been torn limb from limb by
two conductors thus far.

But, a DLM would be more efficient, and actually simplify failure
recovery for Ironic's operators. The database locks suffer from being a
little too conservative, and sometimes you just have to go into the DB
and delete a lock after something explodes (this was true 6 months ago,
it may have better automation sometimes now, I don't know).

Anyway, I'm all for the simplest possible solution. But, don't make it
_too_ simple.



More information about the OpenStack-dev mailing list