Open Stack

Thu Dec 17 00:56:39 UTC 2015

Clint,

I think you are categorically dismissing a very real ops challenge of how to set correct system limits, and how to adjust them in a running system. I have been stung by this challenge repeatedly over the years. As developers we *guess* at what a sensible default for a value will be for a limit, but we are sometimes wrong. When we are, that guess has a very real, and very negative impact on users of production systems. The idea of using one limit for all users is idealistic. I’m convinced based on my experience that it's not the best approach in practice. What we usually want to do is bump up a limit for a single user, or dynamically drop a limit for all users. The problem is that very few systems implement limits in a way they can be adjusted while the system is running, and very rarely on a per-tenant basis. So yes, I will assert that having a quota implementation and the related complexity is justified by the ability to adapt limit levels while the system is running.

Think for a moment about the pain that an ops team goes through when they have to take a service down that’s affecting thousands or tens of thousands of users. We have to send zillions of emails to customers, we need to hold emergency change management meetings. We have to answer questions like “why didn’t you test for this?” when we did test for it, and it worked fine under simulation, but not in a real production environment under this particular stimulus. "Why can’t you take the system down in sections to keep the service up?" When the answer to all this is “because the developers never put themselves in the shoes of the ops team when they designed it.”

Those who know me will attest to the fact that I care deeply about applying the KISS principle. The principle guides us to keep our designs as simple as possible unless it’s essential to make them more complex. In this case, the complexity is justified.

Now if there are production ops teams for large scale systems that argue that dynamic limits and per-user overrides are pointless, then I’ll certainly reconsider my position.

Adrian

> On Dec 16, 2015, at 4:21 PM, Clint Byrum <clint at fewbar.com> wrote:
> 
> Excerpts from Fox, Kevin M's message of 2015-12-16 16:05:29 -0800:
>> Yeah, as an op, I've run into a few things that need quota's that just have basically hardcoded values. heat stacks for example. its a single global in /etc/heat/heat.conf:max_stacks_per_tenant=100. Instead of being able to tweak it for just our one project that legitimately has to create over 200 stacks, I had to set it cloud wide and I had to bounce services to do it. Please don't do that.
>> 
>> Ideally, it would be nice if the quota stuff could be pulled out into its own shared lib  (oslo?) and shared amongst projects so that they don't have to spend much effort implementing quota's. Maybe then things that need quota's that don't currently can more easily get them.
>> 
> 
> You had to change a config value, once, and that's worse than the added
> code complexity and server load that would come from tracking quotas for
> a distributed service?
> 
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Open Stack

[openstack-dev] [openstack][magnum] Quota for Magnum Resources

OpenStack

Community

Documentation

Branding & Legal