[openstack-dev] [all] Outcome of distributed lock manager discussion @ the summit

Robert Collins robertc at robertcollins.net
Thu Nov 5 19:10:06 UTC 2015


On 5 November 2015 at 11:32, Fox, Kevin M <Kevin.Fox at pnnl.gov> wrote:
> To clarify that statement a little more,
>
> Speaking only for myself as an op, I don't want to support yet one more snowflake in a sea of snowflakes, that works differently then all the rest, without a very good reason.
>
> Java has its own set of issues associated with the JVM. Care, and feeding sorts of things. If we are to invest time/money/people in learning how to properly maintain it, its easier to justify if its not just a one off for just DLM,
>
> So I wouldn't go so far as to say we're vehemently opposed to java, just that DLM on its own is probably not a strong enough feature all on its own to justify requiring pulling in java. Its been only a very recent thing that you could convince folks that DLM was needed at all. So either make java optional, or find some other use cases that needs java badly enough that you can make java a required component. I suspect some day searchlight might be compelling enough for that, but not today.
>
> As for the default, the default should be good reference. if most sites would run with etc or something else since java isn't needed, then don't default zookeeper on.

So lets be clear about the discussion at the summit.

There were three, non-conflicting and distinct concerns raised about Java.

One is the 'its a new platform for us operators to understand
operations around' - which is fair, and indeed, Java has different
(not better, different) behaviours to the CPython VM.

Secondly, 'us operators do not want to be a special snowflake, we
*want* to run the majority configuration' - which makes sense, and is
one reason to aim for a convergent stack where possible.

Thirdly, 'many of our customers *will not* run Oracle's JVM and the
stability and performance of Zookeeper on openjdk is an unknown'. The
argument was that we can't pick zk because the herd run it on Oracle's
JVM not openjdk - now there are some unquantified bits here, but it is
known that openjdk has had sufficient differences to Oracle JVM to
cause subtle bugs, so if most large zk shops are running Oracle JVM
then indeed this becomes a special-snowflake risk.

I don't recall *anyone* saying they thought zk was bad, or that they
would refuse to run it if we had chosen zk rather than tooz. We got
stuck on that third issue - there was no way to answer it in the
session, and its obviously a terrifying risk to take.

And because for every option some operators were going to be unhappy,
we fell back to the choice of not making a choice.

There are a bunch of parameters around DLM usage that we haven't
quantified yet - we can talk capabilities sensibly, but we don't yet
know how much load we will put on the DLM, nor how it will scale
relative to cloud size. My naive expectation is that we'll need a
-very- large cloud to stress the cluster size of any decent DLM, but
that request rate / latency could be a potential issue as clouds scale
(e.g. need care and feeding).

-Rob


-- 
Robert Collins <rbtcollins at hp.com>
Distinguished Technologist
HP Converged Cloud



More information about the OpenStack-dev mailing list