<div dir="auto">Why would TripleO not move to convergence at the earliest possible point?</div><div class="gmail_extra"><br><div class="gmail_quote">On Jan 6, 2017 10:37 PM, "Thomas Herve" <<a href="mailto:therve@redhat.com">therve@redhat.com</a>> wrote:<br type="attribution"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">On Fri, Jan 6, 2017 at 6:12 PM, Zane Bitter <<a href="mailto:zbitter@redhat.com">zbitter@redhat.com</a>> wrote:<br>

> tl;dr everything looks great, and memory usage has dropped by about 64%<br>

> since the initial Newton release of Heat.<br>

><br>

> I re-ran my analysis of Heat memory usage in the tripleo-heat-templates<br>

> gate. (This is based on the gate-tripleo-ci-centos-7-ovb-<wbr>nonha job.) Here's<br>

> a pretty picture:<br>

><br>

> <a href="https://fedorapeople.org/~zaneb/tripleo-memory/20170105/heat_memused.png" rel="noreferrer" target="_blank">https://fedorapeople.org/~<wbr>zaneb/tripleo-memory/20170105/<wbr>heat_memused.png</a><br>

><br>

> There is one major caveat here: for the period marked in grey where it says<br>

> "Only 2 engine workers", the job was configured to use only 2 heat-enginer<br>

> worker processes instead of 4, so this is not an apples-to-apples<br>

> comparison. The inital drop at the beginning and the subsequent bounce at<br>

> the end are artifacts of this change. Note that the stable/newton branch is<br>

> _still_ using only 2 engine workers.<br>

><br>

> The rapidly increasing usage on the left is due to increases in the<br>

> complexity of the templates during the Newton cycle. It's clear that if<br>

> there has been any similar complexity growth during Ocata, it has had a tiny<br>

> effect on memory consumption in comparison.<br>

<br>

Thanks a lot for the analysis. It's great that things haven't gotten off track.<br>

<br>

> I tracked down most of the step changes to identifiable patches:<br>

><br>

> 2016-10-07: 2.44GiB -> 1.64GiB<br>

>  - <a href="https://review.openstack.org/382068/" rel="noreferrer" target="_blank">https://review.openstack.org/<wbr>382068/</a> merged, making ResourceInfo classes<br>

> more memory-efficient. Judging by the stable branch (where this and the<br>

> following patch were merged at different times), this was responsible for<br>

> dropping the memory usage from 2.44GiB -> 1.83GiB. (Which seems like a<br>

> disproportionately large change?)<br>

<br>

Without wanting to get the credit, I believe<br>

<a href="https://review.openstack.org/377061/" rel="noreferrer" target="_blank">https://review.openstack.org/<wbr>377061/</a> is more likely the reason here.<br>

<br>

>  - <a href="https://review.openstack.org/#/c/382377/" rel="noreferrer" target="_blank">https://review.openstack.org/#<wbr>/c/382377/</a> merged, so we no longer create<br>

> multiple yaql contexts. (This was responsible for the drop from 1.83GiB -><br>

> 1.64GiB.)<br>

><br>

> 2016-10-17: 1.62GiB -> 0.93GiB<br>

>  - <a href="https://review.openstack.org/#/c/386696/" rel="noreferrer" target="_blank">https://review.openstack.org/#<wbr>/c/386696/</a> merged, reducing the number of<br>

> engine workers on the undercloud to 2.<br>

><br>

> 2016-10-19: 0.93GiB -> 0.73GiB (variance also seemed to drop after this)<br>

>  - <a href="https://review.openstack.org/#/c/386247/" rel="noreferrer" target="_blank">https://review.openstack.org/#<wbr>/c/386247/</a> merged (on 2016-10-16), avoiding<br>

> loading all nested stacks in a single process simultaneously much of the<br>

> time.<br>

>  - <a href="https://review.openstack.org/#/c/383839/" rel="noreferrer" target="_blank">https://review.openstack.org/#<wbr>/c/383839/</a> merged (on 2016-10-16),<br>

> switching output calculations to RPC to avoid almost all simultaneous<br>

> loading of all nested stacks.<br>

><br>

> 2016-11-08: 0.76GiB -> 0.70GiB<br>

>  - This one is a bit of a mystery???<br>

<br>

Possibly <a href="https://review.openstack.org/390064/" rel="noreferrer" target="_blank">https://review.openstack.org/<wbr>390064/</a> ? Reducing the<br>

environment size could have an effect.<br>

<br>

> 2016-11-22: 0.69GiB -> 0.50GiB<br>

>  - <a href="https://review.openstack.org/#/c/398476/" rel="noreferrer" target="_blank">https://review.openstack.org/#<wbr>/c/398476/</a> merged, improving the efficiency<br>

> of resource listing?<br>

><br>

> 2016-12-01: 0.49GiB -> 0.88GiB<br>

>  - <a href="https://review.openstack.org/#/c/399619/" rel="noreferrer" target="_blank">https://review.openstack.org/#<wbr>/c/399619/</a> merged, returning the number of<br>

> engine workers on the undercloud to 4.<br>

><br>

> It's not an exact science because IIUC there's a delay between a patch<br>

> merging in Heat and it being used in subsequent t-h-t gate jobs. e.g. the<br>

> change to getting outputs over RPC landed the day before the<br>

> instack-undercloud patch that cut the number of engine workers, but the<br>

> effects don't show up until 2 days after. I'd love to figure out what<br>

> happened on the 8th of November, but I can't correlate it to anything<br>

> obvious. The attribution of the change on the 22nd also seems dubious, but<br>

> the timing adds up (including on stable/newton).<br>

><br>

> It's fair to say that none of the other patches we merged in an attempt to<br>

> reduce memory usage had any discernible effect :D<br>

><br>

> It's worth reiterating that TripleO still disables convergence in the<br>

> undercloud, so these are all tests of the legacy code path. It would be<br>

> great if we could set up a non-voting job on t-h-t with convergence enabled<br>

> and start tracking memory use over time there too. As a first step, maybe we<br>

> could at least add an experimental job on Heat to give us a baseline?<br>

<br>

+1. We haven't made any huge changes into that direction, but having<br>

some info would be great.<br>

<br>

--<br>

Thomas<br>

<br>

______________________________<wbr>______________________________<wbr>______________<br>

OpenStack Development Mailing List (not for usage questions)<br>

Unsubscribe: <a href="http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe" rel="noreferrer" target="_blank">OpenStack-dev-request@lists.<wbr>openstack.org?subject:<wbr>unsubscribe</a><br>

<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" rel="noreferrer" target="_blank">http://lists.openstack.org/<wbr>cgi-bin/mailman/listinfo/<wbr>openstack-dev</a><br>

</blockquote></div></div>