[openstack-dev] [Cinder][Ironic] A possible solution for HA Active-Active

Jim Rollenhagen jim at jimrollenhagen.com
Fri Jul 31 21:58:04 UTC 2015


On Fri, Jul 31, 2015 at 12:47:34PM -0700, Joshua Harlow wrote:
> Joshua Harlow wrote:
> >Mike Perez wrote:
> >>On Fri, Jul 31, 2015 at 8:56 AM, Joshua Harlow<harlowja at outlook.com>
> >>wrote:
> >>>...random thought here, skip as needed... in all honesty orchestration
> >>>solutions like mesos
> >>>(http://mesos.apache.org/assets/img/documentation/architecture3.jpg),
> >>>map-reduce solutions like hadoop, stream processing systems like apache
> >>>storm (...), are already using zookeeper and I'm not saying we should
> >>>just
> >>>use it cause they are, but the likelihood that they just picked it
> >>>for no
> >>>reason are imho slim.
> >>
> >>I'd really like to see focus cross project. I don't want Ceilometer to
> >>depend on Zoo Keeper, Cinder to depend on etcd, etc. This is not ideal
> >>for an operator to have to deploy, learn and maintain each of these
> >>solutions.
> >>
> >>I think this is difficult when you consider everyone wants options of
> >>their preferred DLM. If we went this route, we should pick one.
> >
> >+1
> >
> >>
> >>Regardless, I want to know if we really need a DLM. Does Ceilometer
> >>really need a DLM? Does Cinder really need a DLM? Can we just use a
> >>hash ring solution where operators don't even have to know or care
> >>about deploying a DLM and running multiple instances of Cinder manager
> >>just works?
> >
> >All very good questions, although IMHO a hash-ring is just a piece of
> >the puzzle, and is more equivalent to sharding resources, which yes is
> >one way to scale as long as each shard never touches anything from the
> >other shards. If those shards ever start to need to touch anything
> >shared then u get back into this same situation again for a DLM (and at
> >that point u really do need the 'distributed' part of DLM, because each
> >shard is distributed).
> >
> >And an few (maybe obvious) questions:
> >
> >- How would re-sharding work?
> >- If sharding (the hash-ring partitioning) is based on entities
> >(conductors/other) owning a 'bucket' of resources (ie entity 1 manages
> >resources A-F, entity 2 manages resources G-M...), what happens if a
> >entity dies, does some other entity take over that bucket, what happens
> >if that entity really hasn't 'died' but is just disconnected from the
> >network (partition tolerance...)? (If the answer is there is a lock on
> >the resource/s being used by each entity, then u get back into the LM
> >question).
> >
> >I'm unsure about how ironic handles these problems (although I believe
> >they have a hash-ring and still have a locking scheme as well, so maybe
> >thats there answer for the dual-entities manipulating the same bucket
> >problem).
> 
> Code for some of this, maybe ironic folks can chime-in:
> 
> https://github.com/openstack/ironic/blob/2015.1.1/ironic/conductor/task_manager.py#L18
> (using DB as DLM)
> 
> Afaik, since ironic built-in a hash-ring and the above task manager since
> the start (or from a very earlier commit) they have better been able to
> accomplish the HA goal, retrofitting stuff on-top of nova,cinder,others...
> is not going to as easy...

I would still like to find time, one day, to use etcd or zookeeper as
our DLM in Ironic. Not having TTLs etc has been painful for us, though
we've mostly worked around it by now.

// jim



More information about the OpenStack-dev mailing list