Open Stack

Tue Aug 9 20:45:53 UTC 2016

On Tue, Aug 9, 2016 at 10:00 PM, Clint Byrum <clint at fewbar.com> wrote:
> Excerpts from Ricardo Rocha's message of 2016-08-08 11:51:00 +0200:
>> Hi.
>>
>> On Mon, Aug 8, 2016 at 1:52 AM, Clint Byrum <clint at fewbar.com> wrote:
>> > Excerpts from Steve Baker's message of 2016-08-08 10:11:29 +1200:
>> >> On 05/08/16 21:48, Ricardo Rocha wrote:
>> >> > Hi.
>> >> >
>> >> > Quick update is 1000 nodes and 7 million reqs/sec :) - and the number
>> >> > of requests should be higher but we had some internal issues. We have
>> >> > a submission for barcelona to provide a lot more details.
>> >> >
>> >> > But a couple questions came during the exercise:
>> >> >
>> >> > 1. Do we really need a volume in the VMs? On large clusters this is a
>> >> > burden, and local storage only should be enough?
>> >> >
>> >> > 2. We observe a significant delay (~10min, which is half the total
>> >> > time to deploy the cluster) on heat when it seems to be crunching the
>> >> > kube_minions nested stacks. Once it's done, it still adds new stacks
>> >> > gradually, so it doesn't look like it precomputed all the info in advance
>> >> >
>> >> > Anyone tried to scale Heat to stacks this size? We end up with a stack
>> >> > with:
>> >> > * 1000 nested stacks (depth 2)
>> >> > * 22000 resources
>> >> > * 47008 events
>> >> >
>> >> > And already changed most of the timeout/retrial values for rpc to get
>> >> > this working.
>> >> >
>> >> > This delay is already visible in clusters of 512 nodes, but 40% of the
>> >> > time in 1000 nodes seems like something we could improve. Any hints on
>> >> > Heat configuration optimizations for large stacks very welcome.
>> >> >
>> >> Yes, we recommend you set the following in /etc/heat/heat.conf [DEFAULT]:
>> >> max_resources_per_stack = -1
>> >>
>> >> Enforcing this for large stacks has a very high overhead, we make this
>> >> change in the TripleO undercloud too.
>> >>
>> >
>> > Wouldn't this necessitate having a private Heat just for Magnum? Not
>> > having a resource limit per stack would leave your Heat engines
>> > vulnerable to being DoS'd by malicious users, since one can create many
>> > many thousands of resources, and thus python objects, in just a couple
>> > of cleverly crafted templates (which is why I added the setting).
>> >
>> > This makes perfect sense in the undercloud of TripleO, which is a
>> > private, single tenant OpenStack. But, for Magnum.. now you're talking
>> > about the Heat that users have access to.
>>
>> We have it already at -1 for these tests. As you say a malicious user
>> could DoS, right now this is manageable in our environment. But maybe
>> move it to a per tenant value, or some special policy? The stacks are
>> created under a separate domain for magnum (for trustees), we could
>> also use that for separation.
>>
>> A separate heat instance sounds like an overkill.
>>
>
> It does, but there's really no way around it. If Magnum users are going
> to create massive stacks, then all of the heat engines will need to be
> able to handle massive stacks anyway, and a quota system would just mean
> that only Magnum gets to fully utilize those engines, which doesn't
> really make much sense at all, does it?

The best might be to see if there are improvements possible either in
the Heat engine (lots of what Zane mentioned seems to be of help,
we're willing to try that) or in the way Magnum creates the stacks.

In any case, things work right now just not perfect yet. Still ok to
get 1000 node clusters deployed in < 25min, people can handle that :)

Thanks!

Ricardo

>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Open Stack

[openstack-dev] [magnum] 2 million requests / sec, 100s of nodes

OpenStack

Community

Documentation

Branding & Legal