[openstack-dev] [heat][tripleo] Heat memory usage in the TripleO gate during Ocata

Emilien Macchi emilien at redhat.com
Fri Jan 6 21:58:54 UTC 2017


On Fri, Jan 6, 2017 at 4:35 PM, Thomas Herve <therve at redhat.com> wrote:
> On Fri, Jan 6, 2017 at 6:12 PM, Zane Bitter <zbitter at redhat.com> wrote:
>> tl;dr everything looks great, and memory usage has dropped by about 64%
>> since the initial Newton release of Heat.
>>
>> I re-ran my analysis of Heat memory usage in the tripleo-heat-templates
>> gate. (This is based on the gate-tripleo-ci-centos-7-ovb-nonha job.) Here's
>> a pretty picture:
>>
>> https://fedorapeople.org/~zaneb/tripleo-memory/20170105/heat_memused.png
>>
>> There is one major caveat here: for the period marked in grey where it says
>> "Only 2 engine workers", the job was configured to use only 2 heat-enginer
>> worker processes instead of 4, so this is not an apples-to-apples
>> comparison. The inital drop at the beginning and the subsequent bounce at
>> the end are artifacts of this change. Note that the stable/newton branch is
>> _still_ using only 2 engine workers.
>>
>> The rapidly increasing usage on the left is due to increases in the
>> complexity of the templates during the Newton cycle. It's clear that if
>> there has been any similar complexity growth during Ocata, it has had a tiny
>> effect on memory consumption in comparison.
>
> Thanks a lot for the analysis. It's great that things haven't gotten off track.
>
>> I tracked down most of the step changes to identifiable patches:
>>
>> 2016-10-07: 2.44GiB -> 1.64GiB
>>  - https://review.openstack.org/382068/ merged, making ResourceInfo classes
>> more memory-efficient. Judging by the stable branch (where this and the
>> following patch were merged at different times), this was responsible for
>> dropping the memory usage from 2.44GiB -> 1.83GiB. (Which seems like a
>> disproportionately large change?)
>
> Without wanting to get the credit, I believe
> https://review.openstack.org/377061/ is more likely the reason here.
>
>>  - https://review.openstack.org/#/c/382377/ merged, so we no longer create
>> multiple yaql contexts. (This was responsible for the drop from 1.83GiB ->
>> 1.64GiB.)
>>
>> 2016-10-17: 1.62GiB -> 0.93GiB
>>  - https://review.openstack.org/#/c/386696/ merged, reducing the number of
>> engine workers on the undercloud to 2.
>>
>> 2016-10-19: 0.93GiB -> 0.73GiB (variance also seemed to drop after this)
>>  - https://review.openstack.org/#/c/386247/ merged (on 2016-10-16), avoiding
>> loading all nested stacks in a single process simultaneously much of the
>> time.
>>  - https://review.openstack.org/#/c/383839/ merged (on 2016-10-16),
>> switching output calculations to RPC to avoid almost all simultaneous
>> loading of all nested stacks.
>>
>> 2016-11-08: 0.76GiB -> 0.70GiB
>>  - This one is a bit of a mystery???
>
> Possibly https://review.openstack.org/390064/ ? Reducing the
> environment size could have an effect.
>
>> 2016-11-22: 0.69GiB -> 0.50GiB
>>  - https://review.openstack.org/#/c/398476/ merged, improving the efficiency
>> of resource listing?
>>
>> 2016-12-01: 0.49GiB -> 0.88GiB
>>  - https://review.openstack.org/#/c/399619/ merged, returning the number of
>> engine workers on the undercloud to 4.
>>
>> It's not an exact science because IIUC there's a delay between a patch
>> merging in Heat and it being used in subsequent t-h-t gate jobs. e.g. the
>> change to getting outputs over RPC landed the day before the
>> instack-undercloud patch that cut the number of engine workers, but the
>> effects don't show up until 2 days after. I'd love to figure out what
>> happened on the 8th of November, but I can't correlate it to anything
>> obvious. The attribution of the change on the 22nd also seems dubious, but
>> the timing adds up (including on stable/newton).
>>
>> It's fair to say that none of the other patches we merged in an attempt to
>> reduce memory usage had any discernible effect :D
>>
>> It's worth reiterating that TripleO still disables convergence in the
>> undercloud, so these are all tests of the legacy code path. It would be
>> great if we could set up a non-voting job on t-h-t with convergence enabled
>> and start tracking memory use over time there too. As a first step, maybe we
>> could at least add an experimental job on Heat to give us a baseline?
>
> +1. We haven't made any huge changes into that direction, but having
> some info would be great.

+1 too. I volunteer to do it.

Quick question: to enable it, is it just a matter of setting
convergence_engine to true in heat.conf (on the undercloud)?
If not, what else if needed?

> --
> Thomas
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



-- 
Emilien Macchi



More information about the OpenStack-dev mailing list