[openstack-dev] [magnum] 2 million requests / sec, 100s of nodes

Ricardo Rocha rocha.porto at gmail.com
Mon Aug 8 10:04:54 UTC 2016


On Mon, Aug 8, 2016 at 11:51 AM, Ricardo Rocha <rocha.porto at gmail.com> wrote:
> Hi.
>
> On Mon, Aug 8, 2016 at 1:52 AM, Clint Byrum <clint at fewbar.com> wrote:
>> Excerpts from Steve Baker's message of 2016-08-08 10:11:29 +1200:
>>> On 05/08/16 21:48, Ricardo Rocha wrote:
>>> > Hi.
>>> >
>>> > Quick update is 1000 nodes and 7 million reqs/sec :) - and the number
>>> > of requests should be higher but we had some internal issues. We have
>>> > a submission for barcelona to provide a lot more details.
>>> >
>>> > But a couple questions came during the exercise:
>>> >
>>> > 1. Do we really need a volume in the VMs? On large clusters this is a
>>> > burden, and local storage only should be enough?
>>> >
>>> > 2. We observe a significant delay (~10min, which is half the total
>>> > time to deploy the cluster) on heat when it seems to be crunching the
>>> > kube_minions nested stacks. Once it's done, it still adds new stacks
>>> > gradually, so it doesn't look like it precomputed all the info in advance
>>> >
>>> > Anyone tried to scale Heat to stacks this size? We end up with a stack
>>> > with:
>>> > * 1000 nested stacks (depth 2)
>>> > * 22000 resources
>>> > * 47008 events
>>> >
>>> > And already changed most of the timeout/retrial values for rpc to get
>>> > this working.
>>> >
>>> > This delay is already visible in clusters of 512 nodes, but 40% of the
>>> > time in 1000 nodes seems like something we could improve. Any hints on
>>> > Heat configuration optimizations for large stacks very welcome.
>>> >
>>> Yes, we recommend you set the following in /etc/heat/heat.conf [DEFAULT]:
>>> max_resources_per_stack = -1
>>>
>>> Enforcing this for large stacks has a very high overhead, we make this
>>> change in the TripleO undercloud too.
>>>
>>
>> Wouldn't this necessitate having a private Heat just for Magnum? Not
>> having a resource limit per stack would leave your Heat engines
>> vulnerable to being DoS'd by malicious users, since one can create many
>> many thousands of resources, and thus python objects, in just a couple
>> of cleverly crafted templates (which is why I added the setting).
>>
>> This makes perfect sense in the undercloud of TripleO, which is a
>> private, single tenant OpenStack. But, for Magnum.. now you're talking
>> about the Heat that users have access to.
>
> We have it already at -1 for these tests. As you say a malicious user
> could DoS, right now this is manageable in our environment. But maybe
> move it to a per tenant value, or some special policy? The stacks are
> created under a separate domain for magnum (for trustees), we could
> also use that for separation.

For reference we also changed max_stacks_per_tenant, which is:
# Maximum number of stacks any one tenant may have active at one time. (integer
# value)

For the 1000 node bay test we had to increase it.

>
> A separate heat instance sounds like an overkill.
>
> Cheers,
> Ricardo
>
>>
>> __________________________________________________________________________
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



More information about the OpenStack-dev mailing list