[openstack-dev] [magnum][heat] 2 million requests / sec, 100s of nodes

Zane Bitter zbitter at redhat.com
Tue Aug 9 21:34:06 UTC 2016


On 07/08/16 19:52, Clint Byrum wrote:
> Excerpts from Steve Baker's message of 2016-08-08 10:11:29 +1200:
>> On 05/08/16 21:48, Ricardo Rocha wrote:
>>> Hi.
>>>
>>> Quick update is 1000 nodes and 7 million reqs/sec :) - and the number
>>> of requests should be higher but we had some internal issues. We have
>>> a submission for barcelona to provide a lot more details.
>>>
>>> But a couple questions came during the exercise:
>>>
>>> 1. Do we really need a volume in the VMs? On large clusters this is a
>>> burden, and local storage only should be enough?
>>>
>>> 2. We observe a significant delay (~10min, which is half the total
>>> time to deploy the cluster) on heat when it seems to be crunching the
>>> kube_minions nested stacks. Once it's done, it still adds new stacks
>>> gradually, so it doesn't look like it precomputed all the info in advance
>>>
>>> Anyone tried to scale Heat to stacks this size? We end up with a stack
>>> with:
>>> * 1000 nested stacks (depth 2)
>>> * 22000 resources
>>> * 47008 events
>>>
>>> And already changed most of the timeout/retrial values for rpc to get
>>> this working.
>>>
>>> This delay is already visible in clusters of 512 nodes, but 40% of the
>>> time in 1000 nodes seems like something we could improve. Any hints on
>>> Heat configuration optimizations for large stacks very welcome.
>>>
>> Yes, we recommend you set the following in /etc/heat/heat.conf [DEFAULT]:
>> max_resources_per_stack = -1
>>
>> Enforcing this for large stacks has a very high overhead, we make this
>> change in the TripleO undercloud too.
>>
>
> Wouldn't this necessitate having a private Heat just for Magnum? Not
> having a resource limit per stack would leave your Heat engines
> vulnerable to being DoS'd by malicious users, since one can create many
> many thousands of resources, and thus python objects, in just a couple
> of cleverly crafted templates (which is why I added the setting).

Although when you added it, all of the resources in a tree of nested 
stacks got handled by a single engine, so sending a really big tree of 
nested stacks was an easy way to DoS Heat. That's no longer the case 
since Kilo; we farm the child stacks out over RPC, so the difficulty of 
carrying out a DoS increases in proportion to the number of cores you 
have running Heat whereas before it was constant. (This is also the 
cause of the performance problem, since counting all the resources in 
the tree when then entire thing was already loaded in-memory was easy.)

Convergence splits it up even further, farming out each _resource_ as 
well as each stack over RPC.

I had the thought that having a per-tenant resource limit might be both 
more effective at both protecting the limited resource and more 
efficient to calculate, since we could have the DB simply count the 
Resource rows for stacks in a given tenant instead of recursively 
loading all of the stacks in a tree and counting the resources in 
heat-engine. However, the tenant isn't stored directly in the Stack 
table, and people who know databases tell me the resulting joins would 
be fearsome.

I'm still not convinced it'd be worse than what we have now, even after 
Steve did a lot of work to make it much, much better than it was at one 
point ;)

> This makes perfect sense in the undercloud of TripleO, which is a
> private, single tenant OpenStack. But, for Magnum.. now you're talking
> about the Heat that users have access to.

Indeed, and now that we're seeing other users of very large stacks 
(Sahara is another) I think we need to come up with a solution that is 
both efficient enough to use on a large/deep tree of nested stacks but 
can still be tuned to protect against DoS at whatever scale Heat is 
deployed at.

cheers,
Zane.



More information about the OpenStack-dev mailing list