[openstack-dev] [Cinder] A possible solution for HA Active-Active

Fox, Kevin M Kevin.Fox at pnnl.gov
Mon Aug 3 15:34:17 UTC 2015


+1.
________________________________________
From: Flavio Percoco [flavio at redhat.com]
Sent: Monday, August 03, 2015 12:30 AM
To: OpenStack Development Mailing List (not for usage questions)
Subject: Re: [openstack-dev] [Cinder] A possible solution for HA        Active-Active

On 03/08/15 00:49 +0200, Gorka Eguileor wrote:
>On Fri, Jul 31, 2015 at 01:47:22AM -0700, Mike Perez wrote:
>> On Mon, Jul 27, 2015 at 12:35 PM, Gorka Eguileor <geguileo at redhat.com> wrote:
>> > I know we've all been looking at the HA Active-Active problem in Cinder
>> > and trying our best to figure out possible solutions to the different
>> > issues, and since current plan is going to take a while (because it
>> > requires that we finish first fixing Cinder-Nova interactions), I've been
>> > looking at alternatives that allow Active-Active configurations without
>> > needing to wait for those changes to take effect.
>> >
>> > And I think I have found a possible solution, but since the HA A-A
>> > problem has a lot of moving parts I ended up upgrading my initial
>> > Etherpad notes to a post [1].
>> >
>> > Even if we decide that this is not the way to go, which we'll probably
>> > do, I still think that the post brings a little clarity on all the
>> > moving parts of the problem, even some that are not reflected on our
>> > Etherpad [2], and it can help us not miss anything when deciding on a
>> > different solution.
>>
>> Based on IRC conversations in the Cinder room and hearing people's
>> opinions in the spec reviews, I'm not convinced the complexity that a
>> distributed lock manager adds to Cinder for both developers and the
>> operators who ultimately are going to have to learn to maintain things
>> like Zoo Keeper as a result is worth it.
>>
>> **Key point**: We're not scaling Cinder itself, it's about scaling to
>> avoid build up of operations from the storage backend solutions
>> themselves.
>>
>> Whatever people think ZooKeeper "scaling level" is going to accomplish
>> is not even a question. We don't need it, because Cinder isn't as
>> complex as people are making it.
>>
>> I'd like to think the Cinder team is a great in recognizing potential
>> cross project initiatives. Look at what Thang Pham has done with
>> Nova's version object solution. He made a generic solution into an
>> Oslo solution for all, and Cinder is using it. That was awesome, and
>> people really appreciated that there was a focus for other projects to
>> get better, not just Cinder.
>>
>> Have people consider Ironic's hash ring solution? The project Akanda
>> is now adopting it [1], and I think it might have potential. I'd
>> appreciate it if interested parties could have this evaluated before
>> the Cinder midcycle sprint next week, to be ready for discussion.
>>
>> [1] - https://review.openstack.org/#/c/195366/
>>
>> -- Mike Perez
>
>Hi all,
>
>Since my original proposal was more complex that it needed be I have a
>new proposal of a simpler solution, and I describe how we can do it with
>or without a DLM since we don't seem to reach an agreement on that.
>
>The solution description was more rushed than previous one so I may have
>missed some things.
>
>http://gorka.eguileor.com/simpler-road-to-cinder-active-active/

First and foremost, thanks for collecting the feedback and working on
a different proposal that integrates what's been discussed so far - or
at least proposes a way forward and gives enough time to make the
right call.

Now, lets please stop for two seconds and say no to adding a DLM for
now.

This thread has already branched out to several discussions on whether
we should use a DLM or not and whether it should be one speciffically
or tooz.

I'll take the chance and reply here directly to collect what's been
said so far.

I'm always down for avoiding new services to the stack because that
makes it harder to deploy, maintain and reason about. However, in the
case of DLM's, there are an essential part of distributed systems.

We've been able to avoid them long enough but we're getting to the
point where we might not be able to do that anymore. Therefore, I
believe we should start discussing, carefully, what/how/when to do it.
This is deffinitely not a decision that should be rushed.

Lets start by mentioning some of the services that use or could use a
DLM - not an exhaustive list:

  - Nova
  - Cinder
  - Ceilometer
  - Keystone
  - Zaqar
  - ....

Each one of these has a specific use-case for a DLM, some of them even
share it (cinder, nova). Therefore, I believe this deserves a
cross-spec where we'd  be able to mention *different* use cases that
would lead us to pick the right technology (or the one that seems
saner ;).

As of now, whether it's Zookeeper, etcd, consul,
put_here_the_new_cool_thing I don't really care. What I care about is
that we pick a single technology that works well for all services. I'm
starting to grow worried about the excessive lack of opinion we have
in cases - like this one - where we should simply be opinionated. A
strong opinion here would helps to be consistent, make it simpler to
understand issues and share knowledge, it'll make OPs lives simpler
(as in there's just one thing they can deploy), etc.

IMHO, OpenStack is confusing enough for us to keep adding abstraction
over abstractions. The topic we're discussing here will impact all
deployments out there and we better try to do one thing and do it
right.

So, to summarize, I love the effort behind this. But, as others have
mentioned, I'd like us to take a step back, run this accross teams and
come up with an opinonated solution that would work for everyone.

Starting this discussion now would allow us to prepare enough material
to reach an agreement in Tokyo and work on a single solution for
Mikata. This sounds like a good topic for a cross-project session.

Cheers,
Flavio

--
@flaper87
Flavio Percoco



More information about the OpenStack-dev mailing list