[openstack-dev] [all] Outcome of distributed lock manager discussion @ the summit

Kevin Carter kevin.carter at RACKSPACE.COM
Tue Nov 10 08:55:09 UTC 2015


Clint,

> While I'm sure it works a lot of the time, when it breaks it will break
> in very mysterious, and possibly undetectable way.

> For some things, this would be no big deal. But in some it may result in
> total disaster, like two conductors trying to both own a single Ironic
> node and one accidentally erasing what the other just wrote there.

This is fair, it may break in unpredictable ways due to the master slave replication, the asynchronous nature of the cluster implementation and how redis promotes a slave when a master go down (all of which could result in catastrophic failures due to the noted race conditions). While a test case would be interesting I acknowledge that it may be impossible to confirm such a situation in a controlled environment. 

>For some things, this would be no big deal. But in some it may result in
>total disaster, like two conductors trying to both own a single Ironic
>node and one accidentally erasing what the other just wrote there.

So that may be a mark against Redis being the preferred back-end for DLM however a quick look into the issue tracker for zookeeper reveals a similar sets of race conditions that are currently open and could result in the same kinds of situations [0]. While not ideal, it may really be a case and weighing the technology choices (like you've said) and picking the best fit for now. 

[0] - http://bit.ly/1NGQrAd  # Search string for Zookeeper Jira was too long so i shortened it.

--

Kevin Carter
IRC: cloudnull


________________________________________
From: Clint Byrum <clint at fewbar.com>
Sent: Tuesday, November 10, 2015 2:21 AM
To: openstack-dev
Subject: Re: [openstack-dev] [all] Outcome of distributed lock manager  discussion @ the summit

Excerpts from Kevin Carter's message of 2015-11-09 22:24:16 -0800:
> Hello all,
>
> The rational behind using a solution like zookeeper makes sense however in reviewing the thread I found myself asking if there was a better way to address the problem without the addition of a Java based solution as the default. While it has been covered that the current implementation would be a reference and that "other" driver support in Tooz would allow for any backend a deployer may want, the work being proposed within devstack [0] would become the default development case thus making it the de-facto standard and I think we could do better in terms of supporting developers and delivering capability.
>
> My thoughts on using Redis+Redislock instead of Java+Zookeeper as the default option:
> * Tooz already support redislock
> * Redis has an established cluster system known for general ease of use and reliability on distributed systems.
> * Several OpenStack projects already support Redis as a backend option or have extended capabilities using a Redis.
> * Redis can be implemented in RHEL, SUSE, and DEB based systems with ease.
> * Redis is Opensource software licensed under the "three clause BSD license" and would not have any of the same questionable license implications as found when dealing with anything Java.
> * The inclusion of Redis would work on a single node allowing developers to continue work using VMs running on Laptops with 4GB or ram but would also scale to support the multi-controller use case with ease. This would also give developers the ability to work on a systems that will actually resemble production.
> * Redislock will bring with it no additional developer facing language dependencies (Redis is written in ANSI C and works ... without external dependencies [1]) while also providing a plethora of language bindings [2].
>
>
> I apologize for questioning the proposed solution so late into the development of this thread and for not making the summit conversations to talk more with everyone whom worked on the proposal. While the ship may have sailed on this point for now I figured I'd ask why we might go down the path of Zookeeper+Java when a solution with likely little to no development effort already exists, can support just about any production/development environment, has lots of bindings, and (IMHO) would integrate with the larger community easier; many OpenStack developers and deployers already know Redis. With the inclusion of ZK+Java in DevStack and the act of making it the default it essentially creates new hard dependencies one of which is Java and I'd like to avoid that if at all possible; basically I think we can do better.
>

Kevin, thanks so much for your thoughts on this. I really do appreciate
that we've had a high diversity of opinions and facts brought to bear on
this subject.

The Aphyr/Jepsen tests that were linked before [1] show, IMO, that Redis
satisfies availability and partition tolerance in the CAP theorem [2].
Consistency is entirely compromised by a partition, and having multiple
redis nodes means using a form of replication with no consistency
guarantees. I find it somewhat confusing that Redis actually claims _ALL
THREE_ things in the description of RedLock [3].

While I'm sure it works a lot of the time, when it breaks it will break
in very mysterious, and possibly undetectable way.

For some things, this would be no big deal. But in some it may result in
total disaster, like two conductors trying to both own a single Ironic
node and one accidentally erasing what the other just wrote there.

So, I think we need to think hard about how Redis's weaknesses would
affect the desired goals before we adopt Redis for DLM.

[1] https://aphyr.com/posts/307-call-me-maybe-redis-redux
[2] https://en.wikipedia.org/wiki/CAP_theorem
[3] http://redis.io/topics/distlock

>
> [0] - https://review.openstack.org/#/c/241040/
> [1] - http://redis.io/topics/introduction
> [2] - http://redis.io/topics/distlock
>
> --
>
> Kevin Carter
> IRC: cloudnull
>
>
> ________________________________________
> From: Fox, Kevin M <Kevin.Fox at pnnl.gov>
> Sent: Monday, November 9, 2015 1:54 PM
> To: maishsk+openstack at maishsk.com; OpenStack Development Mailing List (not for usage questions)
> Subject: Re: [openstack-dev] [all] Outcome of distributed lock manager discussion @ the summit
>
> Dedicating 3 controller nodes in a small cloud is not the best allocation of resources sometimes.  Your thinking of medium to large clouds. Small production clouds are a thing too. and at that scale, a little downtime if you actually hit the rare case of a node failure on the controller may be acceptable. Its up for an OP to decide.
>
> We've also experienced that sometimes HA software causes more, or longer downtimes then it solves sometimes. Due to its complexity, knowledge required, proper testing, etc. Again, the risk gets higher the smaller the cloud is in some ways.
>
> Being able to keep it simple and small for that case, then scale with switching out pieces as needed does have some tangible benefits.
>
> Thanks,
> Kevin
> ________________________________________
> From: Maish Saidel-Keesing [maishsk at maishsk.com]
> Sent: Monday, November 09, 2015 11:35 AM
> To: OpenStack Development Mailing List (not for usage questions)
> Subject: Re: [openstack-dev] [all] Outcome of distributed lock manager discussion @ the summit
>
> On 11/05/15 23:18, Fox, Kevin M wrote:
> > Your assuming there are only 2 choices,
> >   zk or db+rabbit. I'm claiming both hare suboptimal at present. a 3rd might be needed. Though even with its flaws, the db+rabbit choice has a few benefits too.
> >
> > You also seem to assert that to support large clouds, the default must be something that can scale that large. While that would be nice, I don't think its a requirement if its overly burdensome on deployers of non huge clouds.
> >
> > I don't have metrics, but I would be surprised if most deployments today (production + other) used 3 controllers with a full ha setup. I would guess that the majority are single controller setups. With those, the
> I think it would be safe to assume - that any kind of production cloud -
> or any operator that considers their OpenStack environment something
> that is close to production ready - would not be daft enough to deploy
> their whole environment based on a single controller - which is a
> whopper of a single point of failure.
>
> Most Fuel (mirantis) deployments are multiple controllers.
> RHOS also recommends doing multiple controllers.
>
> I don't think that we as a community can afford to assume that 1
> controller will suffice.
> This does not say that maintaining zk will be any easier though.
> > overhead of maintaining a whole dlm like zk seems like overkill. If db+rabbit would work for that one case, that would be one less thing to have to setup for an op. They already have to setup db+rabbit. Or even a clm plugin of some sort, that won't scale, but would be very easy to deploy, and change out later when needed would be very useful.
> >
> > etcd is starting to show up in a lot of other projects, and so it may be at sites already. being able to support it may be less of a burden to operators then zk in some cases.
> >
> > If your cloud grows to the point where the dlm choice really matters for scalability/correctness, then you probably have enough staff members to deal with adding in zk, and that's probably the right choice.
> >
> > You can have multiple suggested things in addition to one default. Default to the thing that makes the most sense in the common most deployments, and make specific recommendations for certain scenarios. like, "if greater then 100 nodes, we strongly recommend using zk" or something to that effect.
> >
> > Thanks,
> > Kevin
> >
> >
> > ________________________________________
> > From: Clint Byrum [clint at fewbar.com]
> > Sent: Thursday, November 05, 2015 11:44 AM
> > To: openstack-dev
> > Subject: Re: [openstack-dev] [all] Outcome of distributed lock manager  discussion @ the summit
> >
> > Excerpts from Fox, Kevin M's message of 2015-11-04 14:32:42 -0800:
> >> To clarify that statement a little more,
> >>
> >> Speaking only for myself as an op, I don't want to support yet one more snowflake in a sea of snowflakes, that works differently then all the rest, without a very good reason.
> >>
> >> Java has its own set of issues associated with the JVM. Care, and feeding sorts of things. If we are to invest time/money/people in learning how to properly maintain it, its easier to justify if its not just a one off for just DLM,
> >>
> >> So I wouldn't go so far as to say we're vehemently opposed to java, just that DLM on its own is probably not a strong enough feature all on its own to justify requiring pulling in java. Its been only a very recent thing that you could convince folks that DLM was needed at all. So either make java optional, or find some other use cases that needs java badly enough that you can make java a required component. I suspect some day searchlight might be compelling enough for that, but not today.
> >>
> >> As for the default, the default should be good reference. if most sites would run with etc or something else since java isn't needed, then don't default zookeeper on.
> >>
> > There are a number of reasons, but the most important are:
> >
> > * Resilience in the face of failures - The current database+MQ based
> >    solutions are all custom made and have unknown characteristics when
> >    there are network partitions and node failures.
> > * Scalability - The current database+MQ solutions rely on polling the
> >    database and/or sending lots of heartbeat messages or even using the
> >    database to store heartbeat transactions. This scales fine for tiny
> >    clusters, but when every new node adds more churn to the MQ and
> >    database, this will (and has been observed to) be intractable.
> > * Tech debt - OpenStack is inventing lock solutions and then maintaining
> >    them. And service discovery solutions, and then maintaining them.
> >    Wouldn't you rather have better upgrade stories, more stability, more
> >    scale, and more featuers?
> >
> > If those aren't compelling enough reasons to deploy a mature java service
> > like Zookeeper, I don't know what would be. But I do think using the
> > abstraction layer of tooz will at least allow us to move forward without
> > having to convince everybody everywhere that this is actually just the
> > path of least resistance.
> >
> >
>
> --
> Best Regards,
> Maish Saidel-Keesing
>

__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


More information about the OpenStack-dev mailing list