[openstack-dev] [magnum][heat] 2 million requests / sec, 100s of nodes

Ricardo Rocha rocha.porto at gmail.com
Mon Aug 8 21:09:02 UTC 2016


Hi.

On Mon, Aug 8, 2016 at 6:17 PM, Zane Bitter <zbitter at redhat.com> wrote:
> On 05/08/16 12:01, Hongbin Lu wrote:
>>
>> Add [heat] to the title to get more feedback.
>>
>>
>>
>> Best regards,
>>
>> Hongbin
>>
>>
>>
>> *From:*Ricardo Rocha [mailto:rocha.porto at gmail.com]
>> *Sent:* August-05-16 5:48 AM
>> *To:* OpenStack Development Mailing List (not for usage questions)
>> *Subject:* Re: [openstack-dev] [magnum] 2 million requests / sec, 100s
>> of nodes
>>
>>
>>
>> Hi.
>>
>>
>>
>> Quick update is 1000 nodes and 7 million reqs/sec :) - and the number of
>> requests should be higher but we had some internal issues. We have a
>> submission for barcelona to provide a lot more details.
>>
>>
>>
>> But a couple questions came during the exercise:
>>
>>
>>
>> 1. Do we really need a volume in the VMs? On large clusters this is a
>> burden, and local storage only should be enough?
>>
>>
>>
>> 2. We observe a significant delay (~10min, which is half the total time
>> to deploy the cluster) on heat when it seems to be crunching the
>> kube_minions nested stacks. Once it's done, it still adds new stacks
>> gradually, so it doesn't look like it precomputed all the info in advance
>>
>>
>>
>> Anyone tried to scale Heat to stacks this size? We end up with a stack
>> with:
>>
>> * 1000 nested stacks (depth 2)
>>
>> * 22000 resources
>>
>> * 47008 events
>
>
> Wow, that's a big stack :) TripleO has certainly been pushing the boundaries
> of how big a stack Heat can handle, but this sounds like another step up
> even from there.
>
>> And already changed most of the timeout/retrial values for rpc to get
>> this working.
>>
>>
>>
>> This delay is already visible in clusters of 512 nodes, but 40% of the
>> time in 1000 nodes seems like something we could improve. Any hints on
>> Heat configuration optimizations for large stacks very welcome.
>
>
> Y'all were right to set max_resources_per_stack to -1, because actually
> checking the number of resources in a tree of stacks is sloooooow. (Not as
> slow as it used to be when it was O(n^2), but still pretty slow.)
>
> We're actively working on trying to make Heat more horizontally scalable
> (even at the cost of some performance penalty) so that if you need to handle
> this kind of scale then you'll be able to reach it by adding more
> heat-engines. Another big step forward on this front is coming with Newton,
> as (barring major bugs) the convergence_engine architecture will be enabled
> by default.
>
> RPC timeouts are caused by the synchronous work that Heat does before
> returning a result to the caller. Most of this is validation of the data
> provided by the user. We've talked about trying to reduce the amount of
> validation done synchronously to a minimum (just enough to guarantee that we
> can store and retrieve the data from the DB) and push the rest into the
> asynchronous part of the stack operation alongside the actual create/update.
> (FWIW, TripleO typically uses a 600s RPC timeout.)
>
> The "QueuePool limit of size ... overflow ... reached" sounds like we're
> pulling messages off the queue even when we don't have threads available in
> the pool to pass them to. If you have a fix for this it would be much
> appreciated. However, I don't think there's any guarantee that just leaving
> messages on the queue can't lead to deadlocks. The problem with very large
> trees of nested stacks is not so much that it's a lot of stacks (Heat
> doesn't have _too_ much trouble with that) but that they all have to be
> processed simultaneously. e.g. to validate the top level stack you also need
> to validate all of the lower level stacks before returning the result. If
> higher-level stacks consume all of the thread pools then you'll get a
> deadlock as you'll be unable to validate any lower-level stacks. At this
> point you'd have maxed out the capacity of your Heat engines to process
> stacks simultaneously and you'd need to scale out to more Heat engines. The
> solution is probably to try limit the number of nested stack validations we
> send out concurrently.
>
> Improving performance at scale is a priority area of focus for the Heat team
> at the moment. That's been mostly driven by TripleO and Sahara, but we'd be
> very keen to hear about the kind of loads that Magnum is putting on Heat and
> working with folks across the community to figure out how to improve things
> for those use cases.

Thanks for the detailed reply, especially regarding the handling of
the nested stacks by the engines, much clearer now.

Seems like there's a couple of things we can try already:
* scaling the heat engines (we're currently running 3 nodes with 5
engines each, can check if more help, though it seems with >1000
nested stacks it will be hard to avoid starvation)
* trying the convergence_engine: as far as i could see this is already
there, just not enabled by default. We can give it a try and let you
know how it goes if there's no obvious drawback. Would it just work
with the current schema? We're running heat mitaka

Discussing further these large stack use cases in Magnum sounds like a
great idea.

Thanks!

Ricardo

> cheers,
> Zane.
>
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



More information about the OpenStack-dev mailing list