[openstack-dev] [all] Outcome of distributed lock manager discussion @ the summit

Clint Byrum clint at fewbar.com
Thu Nov 5 23:19:16 UTC 2015


Excerpts from Fox, Kevin M's message of 2015-11-05 13:18:13 -0800:
> Your assuming there are only 2 choices,
>  zk or db+rabbit. I'm claiming both hare suboptimal at present. a 3rd might be needed. Though even with its flaws, the db+rabbit choice has a few benefits too.
> 

Well, I'm assuming it is zk/etcd/consul, because while the java argument
is rather religious, the reality is all three are significantly different
from databases and message queues and thus will be "snowflakes". But yes,
I _am_ assuming that Zookeeper is a natural, logical, simple choice,
and that fact that it runs in a jvm is a poor reason to avoid it.

> You also seem to assert that to support large clouds, the default must be something that can scale that large. While that would be nice, I don't think its a requirement if its overly burdensome on deployers of non huge clouds.
> 

I think the current solution even scales poorly for medium sized
clouds. Only the tiniest of clouds with the fewest nodes can really
sustain all of that polling without incurring cost for that overhead
that would be better spent on serviceing users.

> I don't have metrics, but I would be surprised if most deployments today (production + other) used 3 controllers with a full ha setup. I would guess that the majority are single controller setups. With those, the overhead of maintaining a whole dlm like zk seems like overkill. If db+rabbit would work for that one case, that would be one less thing to have to setup for an op. They already have to setup db+rabbit. Or even a clm plugin of some sort, that won't scale, but would be very easy to deploy, and change out later when needed would be very useful.
> 

We do have metrics:

http://www.openstack.org/assets/survey/Public-User-Survey-Report.pdf

Page 35, "How many physical compute nodes do OpenStack clouds have?"


10-99:    42%
1-9:      36%
100-999:  15%
1000-9999: 7%

So for respondents to that survey, yes, "most" are running less than 100
nodes. However, by compute node count, if we extrapolate a bit:

There were 154 respondents so:

10-99 * 42% =    640 - 6403 nodes
1-9 * 36% =      55 - 498 nodes
100-999 * 15% =  2300 - 23076 nodes
1000-9999 * 7% = 10000 - 107789 nodes

So in terms of the number of actual computers running OpenStack compute,
as an example, from the survey respondents, there are more computes
running in *one* of the clouds with more than 1000 nodes than there are
in *all* of the clouds with less than 10 nodes, and certainly more in
all of the clouds over 1000 nodes, than in all of the clouds with less
than 100 nodes.

What this means, to me, is that the investment in OpenStack should focus
on those with > 1000, since those orgs are definitely investing a lot
more today. We shouldn't make it _hard_ to do a tiny cloud, but I think
it's ok to make the tiny cloud less efficient if it means we can grow
it into a monster cloud at any point and we continue to garner support
from orgs who need to build large scale clouds.

(I realize I'm biased because I want to build a cloud with more than
1000 nodes ;)

> etcd is starting to show up in a lot of other projects, and so it may be at sites already. being able to support it may be less of a burden to operators then zk in some cases.
> 

Sure, just like some shops already have postgres and in theory you can
still run OpenStack on postgres. But the testing level for postgres
support is so abyssmal that I'd be surprised if anybody was actually
_choosing_ to do this. I can see this going the same way, where we give
everyone a choice, but then end up with almost nobody using any
alternative choices because the community has only rallied around the
one dominat choice.

> If your cloud grows to the point where the dlm choice really matters for scalability/correctness, then you probably have enough staff members to deal with adding in zk, and that's probably the right choice.
> 

If your cloud is 40 compute nodes, and three nines (which, lets face
it, thats the availability profile of a cloud with one controller), we
can just throw Zookeeper up untuned and satisfy the needs. Why would we
want to put up a custom homegrown db+mq solution and then force a change
later on if the cloud grows? A single code path seems a lot better than
multiple code paths, some of which are not really well tested.

> You can have multiple suggested things in addition to one default. Default to the thing that makes the most sense in the common most deployments, and make specific recommendations for certain scenarios. like, "if greater then 100 nodes, we strongly recommend using zk" or something to that effect.
> 

Choices are not free either. Just edit that statement there: "We
strongly recommend using zk." Nothing about ZK, etcd, or consul,
invalidates running on a small cloud. In many ways it makes things
simpler, since the user doesn't have to decide on a DLM, but instead
just installs the thing we recommend.



More information about the OpenStack-dev mailing list