[placement][ptg] Gate health management

Matthew Treinish mtreinish at kortar.org
Mon Apr 8 18:28:53 UTC 2019

On Mon, Apr 08, 2019 at 11:57:51AM -0500, Matt Riedemann wrote:
> On 4/8/2019 11:42 AM, Chris Dent wrote:
> >      * Figure out where the slowest tests are which aren't marked
> >        slow: http://status.openstack.org/elastic-recheck/#1783405
> I tried something related to this last week to run the tempest-full*
> scenario tests with 2 workers concurrently rather than serially:
> https://review.openstack.org/#/c/650300/
> But looking at the stackviz output on that it doesn't seem to have worked at
> all, the scenario tests running at the end appear to still be running in
> serial. I don't know if that is a bug in *testr* or what - maybe mtreinish
> knows.

This was a concious decision made to reduce the load during the scenario tests.
The scenario tests are run serially after all the other tests are run in
parallel. [1] The volume tests in particular were stressing the test
environments a lot ~2yrs ago so this was done to mitigate that. There are
more details in the commit message making the change:


(FWIW I mildly disagreed with this direction, but not enough to block it)

As for how best to determine this. We actually aggregate all the data already
in the subunit2sql db. openstack-health does provide a slowest job list
aggregated over time per job using this data:


You just change the sort column to "Mean Runtime". I think there is a bug
in the rolling average function there because those numbers look wrong, but
it should be relative numbers.

I also had this old script on my laptop [2] which I used to get a list of
tests ordered by average speed (over the last 300 runs) filtered for those
which took > 10 seconds. I ran this just now and generated this list:


The script is easily modifiable to change job or number of runs.
(I also think I've shared a version of it on ML before)

-Matt Treinish

[1] https://github.com/openstack/tempest/blob/master/tox.ini#L107-L109
[2] http://paste.openstack.org/show/749015/
