[openstack-dev] [magnum][heat] 2 million requests / sec, 100s of nodes

Zane Bitter zbitter at redhat.com
Mon Aug 8 16:17:04 UTC 2016

On 05/08/16 12:01, Hongbin Lu wrote:
> Add [heat] to the title to get more feedback.
> Best regards,
> Hongbin
> *From:*Ricardo Rocha [mailto:rocha.porto at gmail.com]
> *Sent:* August-05-16 5:48 AM
> *To:* OpenStack Development Mailing List (not for usage questions)
> *Subject:* Re: [openstack-dev] [magnum] 2 million requests / sec, 100s
> of nodes
> Hi.
> Quick update is 1000 nodes and 7 million reqs/sec :) - and the number of
> requests should be higher but we had some internal issues. We have a
> submission for barcelona to provide a lot more details.
> But a couple questions came during the exercise:
> 1. Do we really need a volume in the VMs? On large clusters this is a
> burden, and local storage only should be enough?
> 2. We observe a significant delay (~10min, which is half the total time
> to deploy the cluster) on heat when it seems to be crunching the
> kube_minions nested stacks. Once it's done, it still adds new stacks
> gradually, so it doesn't look like it precomputed all the info in advance
> Anyone tried to scale Heat to stacks this size? We end up with a stack with:
> * 1000 nested stacks (depth 2)
> * 22000 resources
> * 47008 events

Wow, that's a big stack :) TripleO has certainly been pushing the 
boundaries of how big a stack Heat can handle, but this sounds like 
another step up even from there.

> And already changed most of the timeout/retrial values for rpc to get
> this working.
> This delay is already visible in clusters of 512 nodes, but 40% of the
> time in 1000 nodes seems like something we could improve. Any hints on
> Heat configuration optimizations for large stacks very welcome.

Y'all were right to set max_resources_per_stack to -1, because actually 
checking the number of resources in a tree of stacks is sloooooow. (Not 
as slow as it used to be when it was O(n^2), but still pretty slow.)

We're actively working on trying to make Heat more horizontally scalable 
(even at the cost of some performance penalty) so that if you need to 
handle this kind of scale then you'll be able to reach it by adding more 
heat-engines. Another big step forward on this front is coming with 
Newton, as (barring major bugs) the convergence_engine architecture will 
be enabled by default.

RPC timeouts are caused by the synchronous work that Heat does before 
returning a result to the caller. Most of this is validation of the data 
provided by the user. We've talked about trying to reduce the amount of 
validation done synchronously to a minimum (just enough to guarantee 
that we can store and retrieve the data from the DB) and push the rest 
into the asynchronous part of the stack operation alongside the actual 
create/update. (FWIW, TripleO typically uses a 600s RPC timeout.)

The "QueuePool limit of size ... overflow ... reached" sounds like we're 
pulling messages off the queue even when we don't have threads available 
in the pool to pass them to. If you have a fix for this it would be much 
appreciated. However, I don't think there's any guarantee that just 
leaving messages on the queue can't lead to deadlocks. The problem with 
very large trees of nested stacks is not so much that it's a lot of 
stacks (Heat doesn't have _too_ much trouble with that) but that they 
all have to be processed simultaneously. e.g. to validate the top level 
stack you also need to validate all of the lower level stacks before 
returning the result. If higher-level stacks consume all of the thread 
pools then you'll get a deadlock as you'll be unable to validate any 
lower-level stacks. At this point you'd have maxed out the capacity of 
your Heat engines to process stacks simultaneously and you'd need to 
scale out to more Heat engines. The solution is probably to try limit 
the number of nested stack validations we send out concurrently.

Improving performance at scale is a priority area of focus for the Heat 
team at the moment. That's been mostly driven by TripleO and Sahara, but 
we'd be very keen to hear about the kind of loads that Magnum is putting 
on Heat and working with folks across the community to figure out how to 
improve things for those use cases.


More information about the OpenStack-dev mailing list