[openstack-dev] [Cinder][Ironic] A possible solution for HA Active-Active
Gorka Eguileor
geguileo at redhat.com
Mon Aug 3 07:58:48 UTC 2015
On Fri, Jul 31, 2015 at 12:47:34PM -0700, Joshua Harlow wrote:
> Joshua Harlow wrote:
> >Mike Perez wrote:
> >>On Fri, Jul 31, 2015 at 8:56 AM, Joshua Harlow<harlowja at outlook.com>
> >>wrote:
> >>>...random thought here, skip as needed... in all honesty orchestration
> >>>solutions like mesos
> >>>(http://mesos.apache.org/assets/img/documentation/architecture3.jpg),
> >>>map-reduce solutions like hadoop, stream processing systems like apache
> >>>storm (...), are already using zookeeper and I'm not saying we should
> >>>just
> >>>use it cause they are, but the likelihood that they just picked it
> >>>for no
> >>>reason are imho slim.
> >>
> >>I'd really like to see focus cross project. I don't want Ceilometer to
> >>depend on Zoo Keeper, Cinder to depend on etcd, etc. This is not ideal
> >>for an operator to have to deploy, learn and maintain each of these
> >>solutions.
> >>
> >>I think this is difficult when you consider everyone wants options of
> >>their preferred DLM. If we went this route, we should pick one.
> >
> >+1
> >
> >>
> >>Regardless, I want to know if we really need a DLM. Does Ceilometer
> >>really need a DLM? Does Cinder really need a DLM? Can we just use a
> >>hash ring solution where operators don't even have to know or care
> >>about deploying a DLM and running multiple instances of Cinder manager
> >>just works?
> >
> >All very good questions, although IMHO a hash-ring is just a piece of
> >the puzzle, and is more equivalent to sharding resources, which yes is
> >one way to scale as long as each shard never touches anything from the
> >other shards. If those shards ever start to need to touch anything
> >shared then u get back into this same situation again for a DLM (and at
> >that point u really do need the 'distributed' part of DLM, because each
> >shard is distributed).
> >
> >And an few (maybe obvious) questions:
> >
> >- How would re-sharding work?
> >- If sharding (the hash-ring partitioning) is based on entities
> >(conductors/other) owning a 'bucket' of resources (ie entity 1 manages
> >resources A-F, entity 2 manages resources G-M...), what happens if a
> >entity dies, does some other entity take over that bucket, what happens
> >if that entity really hasn't 'died' but is just disconnected from the
> >network (partition tolerance...)? (If the answer is there is a lock on
> >the resource/s being used by each entity, then u get back into the LM
> >question).
> >
> >I'm unsure about how ironic handles these problems (although I believe
> >they have a hash-ring and still have a locking scheme as well, so maybe
> >thats there answer for the dual-entities manipulating the same bucket
> >problem).
>
> Code for some of this, maybe ironic folks can chime-in:
>
> https://github.com/openstack/ironic/blob/2015.1.1/ironic/conductor/task_manager.py#L18
> (using DB as DLM)
>
> Afaik, since ironic built-in a hash-ring and the above task manager since
> the start (or from a very earlier commit) they have better been able to
> accomplish the HA goal, retrofitting stuff on-top of nova,cinder,others...
> is not going to as easy...
If you really look at what that code is actually doing you'll see it's
basically implementing a DLM on top of a database. We can do that as
well, but it will still be a DLM.
So people, please don't get caught up on the implementation details when
making suggestions, if you think that we don't need a DLM but then you
suggest implementing a DLM on top of a DB, then you are thinking too
close to the code.
First you decide if you want/need a DLM, and then you choose whether you
want it implemented with a standard solution like Redis or Zookeper, or
if you want to implement it manually on top of your DB.
Cheers,
Gorka.
More information about the OpenStack-dev
mailing list