[openstack-dev] [all] Outcome of distributed lock manager discussion @ the summit

Fox, Kevin M Kevin.Fox at pnnl.gov
Fri Nov 6 19:27:55 UTC 2015



> -----Original Message-----
> From: Clint Byrum [mailto:clint at fewbar.com]
> Sent: Thursday, November 05, 2015 3:19 PM
> To: openstack-dev
> Subject: Re: [openstack-dev] [all] Outcome of distributed lock manager
> discussion @ the summit
> 
> Excerpts from Fox, Kevin M's message of 2015-11-05 13:18:13 -0800:
> > Your assuming there are only 2 choices,  zk or db+rabbit. I'm claiming
> > both hare suboptimal at present. a 3rd might be needed. Though even
> with its flaws, the db+rabbit choice has a few benefits too.
> >
> 
> Well, I'm assuming it is zk/etcd/consul, because while the java argument is
> rather religious, the reality is all three are significantly different from
> databases and message queues and thus will be "snowflakes". But yes, I
> _am_ assuming that Zookeeper is a natural, logical, simple choice, and that
> fact that it runs in a jvm is a poor reason to avoid it.

Yes. Having a snowflake there is probably unavoidable, but how much of one is.

I've had to tune jvm stuff like the java stack size when things spontaneously break, and then they tell you, oh, yeah, what that happens, go tweak such and such in the jvm... Unix sysadmins usually know the  common things for c  apps without much effort. And tend to know to look in advance. In my, somewhat limited experience with go, the runtime seems closer to regular unix programs then jvm ones.

The term 'java' is often conflated to mean both the java language, and the jvm runtime. When people talk about java, often they are talking about the jvm. I think this is one of those cases. Its easier to debug c/go for unix admins not trained specifically in jvm behaviors/tunables.

> 
> > You also seem to assert that to support large clouds, the default must be
> something that can scale that large. While that would be nice, I don't think
> its a requirement if its overly burdensome on deployers of non huge clouds.
> >
> 
> I think the current solution even scales poorly for medium sized clouds.
> Only the tiniest of clouds with the fewest nodes can really sustain all of that
> polling without incurring cost for that overhead that would be better spent
> on serviceing users.

While not ideal, I've run clouds with around 100 nodes on a single controller. If its doable today, it should be doable with the new system. Its not ideal, but if it's a zero effort deploy, and easy to debug, that has something going for it.

> 
> > I don't have metrics, but I would be surprised if most deployments today
> (production + other) used 3 controllers with a full ha setup. I would guess
> that the majority are single controller setups. With those, the overhead of
> maintaining a whole dlm like zk seems like overkill. If db+rabbit would work
> for that one case, that would be one less thing to have to setup for an op.
> They already have to setup db+rabbit. Or even a clm plugin of some sort,
> that won't scale, but would be very easy to deploy, and change out later
> when needed would be very useful.
> >
> 
> We do have metrics:
> 
> http://www.openstack.org/assets/survey/Public-User-Survey-Report.pdf
> 
> Page 35, "How many physical compute nodes do OpenStack clouds have?"
> 

Not what I was asking. It was asking how many controllers, not how many compute nodes. Like I said above, 1 controller can handle quite a bit of compute nodes.

> 
> 10-99:    42%
> 1-9:      36%
> 100-999:  15%
> 1000-9999: 7%
> 
> So for respondents to that survey, yes, "most" are running less than 100
> nodes. However, by compute node count, if we extrapolate a bit:
> 
> There were 154 respondents so:
> 
> 10-99 * 42% =    640 - 6403 nodes
> 1-9 * 36% =      55 - 498 nodes
> 100-999 * 15% =  2300 - 23076 nodes
> 1000-9999 * 7% = 10000 - 107789 nodes
>

This is good, but I believe this is biased towards the top end.

Respondents are much more likely to respond if they have a larger cloud to brag about. Folks doing it for development, testing, and other reasons may not respond because its not worth the effort. 

> So in terms of the number of actual computers running OpenStack compute,
> as an example, from the survey respondents, there are more computes
> running in *one* of the clouds with more than 1000 nodes than there are in
> *all* of the clouds with less than 10 nodes, and certainly more in all of the
> clouds over 1000 nodes, than in all of the clouds with less than 100 nodes.

For the reason listed above, I don't think we have enough evidence draw too strong a conclusion from this.

> 
> What this means, to me, is that the investment in OpenStack should focus
> on those with > 1000, since those orgs are definitely investing a lot more
> today. We shouldn't make it _hard_ to do a tiny cloud, but I think it's ok to
> make the tiny cloud less efficient if it means we can grow it into a monster
> cloud at any point and we continue to garner support from orgs who need to
> build large scale clouds.

Yeah, I'd say, we for sure need a solution for 1000+.

We also need a really easy solution for 100-

I believe ZK is probably the right solution for the 1000+ case for sure.
For 100-, I think ZK may be overkill.

For the middle, Not sure. I'd prefer not 3 recommended solutions though. 2 is pushing it, but may be reasonable if the huge case is burdensome on ops.

> 
> (I realize I'm biased because I want to build a cloud with more than
> 1000 nodes ;)

I'd love to be in that boat too. :)

> 
> > etcd is starting to show up in a lot of other projects, and so it may be at
> sites already. being able to support it may be less of a burden to operators
> then zk in some cases.
> >
> 
> Sure, just like some shops already have postgres and in theory you can still
> run OpenStack on postgres. But the testing level for postgres support is so
> abyssmal that I'd be surprised if anybody was actually _choosing_ to do this.
> I can see this going the same way, where we give everyone a choice, but
> then end up with almost nobody using any alternative choices because the
> community has only rallied around the one dominant choice.

This is a fair argument. But this came about due to having choice and letting the community decide over time. Frankly, I prefer postgres but these days deploy with mysql since the community prefers it so much more.

If the whole community settles on ZK as being the thing, then it will be the right think to default to. I'm not sure we're there yet though.

> 
> > If your cloud grows to the point where the dlm choice really matters for
> scalability/correctness, then you probably have enough staff members to
> deal with adding in zk, and that's probably the right choice.
> >
> 
> If your cloud is 40 compute nodes, and three nines (which, lets face it, thats
> the availability profile of a cloud with one controller), we can just throw
> Zookeeper up untuned and satisfy the needs. Why would we want to put up
> a custom homegrown db+mq solution and then force a change later on if the
> cloud grows? A single code path seems a lot better than multiple code
> paths, some of which are not really well tested.

Are you sure your not going to run into issues where the jvm and mysql, or something else on the node fight for memory or other resources? Maybe not. Maybe I'm being overly cautious.

If its as simple as yum install zk; systemctl start zk and it 'just works' without having to touch it, I'm ok with that. If it's a, go to sun's website, select your os, find the right jvm that matches up to what zk wants to install, accept a eula, paste in the link to wget it, install the rpm, blablabla, it's a no go. That has been a HUGE pain before and maybe its better now. It can't be a burden to be acceptable.

> 
> > You can have multiple suggested things in addition to one default. Default
> to the thing that makes the most sense in the common most deployments,
> and make specific recommendations for certain scenarios. like, "if greater
> then 100 nodes, we strongly recommend using zk" or something to that
> effect.
> >
> 
> Choices are not free either. Just edit that statement there: "We strongly
> recommend using zk." Nothing about ZK, etcd, or consul, invalidates running
> on a small cloud. In many ways it makes things simpler, since the user
> doesn't have to decide on a DLM, but instead just installs the thing we
> recommend.

I'm all for it if its without burden to the operator of the small cloud instances. If its burdensome, some smaller instances wont get deployed, and that will lead to fewer bigger instances (1000+) getting approved. OpenStack is already harder to install then it really should be. We don't want to make it harder yet.

Thanks,
Kevin
> 
> ___________________________________________________________________
> _______
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-
> request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



More information about the OpenStack-dev mailing list