[openstack-dev] [Cinder] A possible solution for HA Active-Active
Gorka Eguileor
geguileo at redhat.com
Mon Aug 3 08:18:06 UTC 2015
On Mon, Aug 03, 2015 at 12:28:27AM -0700, Clint Byrum wrote:
> Excerpts from Gorka Eguileor's message of 2015-08-02 15:49:46 -0700:
> > On Fri, Jul 31, 2015 at 01:47:22AM -0700, Mike Perez wrote:
> > > On Mon, Jul 27, 2015 at 12:35 PM, Gorka Eguileor <geguileo at redhat.com> wrote:
> > > > I know we've all been looking at the HA Active-Active problem in Cinder
> > > > and trying our best to figure out possible solutions to the different
> > > > issues, and since current plan is going to take a while (because it
> > > > requires that we finish first fixing Cinder-Nova interactions), I've been
> > > > looking at alternatives that allow Active-Active configurations without
> > > > needing to wait for those changes to take effect.
> > > >
> > > > And I think I have found a possible solution, but since the HA A-A
> > > > problem has a lot of moving parts I ended up upgrading my initial
> > > > Etherpad notes to a post [1].
> > > >
> > > > Even if we decide that this is not the way to go, which we'll probably
> > > > do, I still think that the post brings a little clarity on all the
> > > > moving parts of the problem, even some that are not reflected on our
> > > > Etherpad [2], and it can help us not miss anything when deciding on a
> > > > different solution.
> > >
> > > Based on IRC conversations in the Cinder room and hearing people's
> > > opinions in the spec reviews, I'm not convinced the complexity that a
> > > distributed lock manager adds to Cinder for both developers and the
> > > operators who ultimately are going to have to learn to maintain things
> > > like Zoo Keeper as a result is worth it.
> > >
> > > **Key point**: We're not scaling Cinder itself, it's about scaling to
> > > avoid build up of operations from the storage backend solutions
> > > themselves.
> > >
> > > Whatever people think ZooKeeper "scaling level" is going to accomplish
> > > is not even a question. We don't need it, because Cinder isn't as
> > > complex as people are making it.
> > >
> > > I'd like to think the Cinder team is a great in recognizing potential
> > > cross project initiatives. Look at what Thang Pham has done with
> > > Nova's version object solution. He made a generic solution into an
> > > Oslo solution for all, and Cinder is using it. That was awesome, and
> > > people really appreciated that there was a focus for other projects to
> > > get better, not just Cinder.
> > >
> > > Have people consider Ironic's hash ring solution? The project Akanda
> > > is now adopting it [1], and I think it might have potential. I'd
> > > appreciate it if interested parties could have this evaluated before
> > > the Cinder midcycle sprint next week, to be ready for discussion.
> > >
> > > [1] - https://review.openstack.org/#/c/195366/
> > >
> > > -- Mike Perez
> >
> > Hi all,
> >
> > Since my original proposal was more complex that it needed be I have a
> > new proposal of a simpler solution, and I describe how we can do it with
> > or without a DLM since we don't seem to reach an agreement on that.
> >
> > The solution description was more rushed than previous one so I may have
> > missed some things.
> >
> > http://gorka.eguileor.com/simpler-road-to-cinder-active-active/
> >
>
> I like the idea of keeping it simpler Gorka. :)
>
> Note that this is punting back to "use the database for coordination",
> which is what most projects have done thus far, and has a number of
> advantages and disadvantages.
>
> Note that the stale-lock problem was solved in an interesting way in Heat:
> each engine process gets an "instance-of-engine" uuid that adds to the
> topic queues it listens to. If it holds a lock, it records this UUID in
> the owner field. When somebody wants to steal the lock (due to timeout)
> they send to this queue, and if there's no response, the lock is stolen.
I don't think that's a good idea for Cinder, if the node that is holding
the lock is doing a long running CPU bound operation (like backups) it
may not be fast enough to reply to that message, and that would end up
with multiple nodes accessing the same data.
Using the service heartbeat and the startup of the volume nodes we can
do automatic cleanups on failed nodes. And lets be realistic, failed
nodes will not be the norm, so we should prioritize normal operations
over failure cleanup.
And having inter-node operations like that will not only increase our
message broker workload but it will also set more strict constraint in
our Volume Node responsiveness. Which could put us in a pinch in some
operations and would require a careful and thorough empirical study to
confirm that we don't have false positives on lock steals.
>
> Anyway, I think what might make more sense than copying that directly,
> is implementing "Use the database and oslo.messaging to build a DLM"
> as a tooz backend. This way as the negative aspects of that approach
> impact an operator, they can pick a tooz driver that satisfies their
> needs, or even write one to their specific backend needs.
>
I have no problem implementing a locking variant in Tooz using the DB
(not DB locks). As far as I've seen Tooz community moves really fast
with reviews and we could probably have that quite fast. But I don't
think that's the best way we can apply or efforts, in that case I'd
prefer going directly with the mutual exclusion at the API using a new
*reading* status.
Cheers,
Gorka.
More information about the OpenStack-dev
mailing list