[placement][ptg] Gate health management

Matthew Treinish mtreinish at kortar.org
Mon Apr 8 18:28:53 UTC 2019


On Mon, Apr 08, 2019 at 11:57:51AM -0500, Matt Riedemann wrote:
> On 4/8/2019 11:42 AM, Chris Dent wrote:
> >      * Figure out where the slowest tests are which aren't marked
> >        slow: http://status.openstack.org/elastic-recheck/#1783405
> 
> I tried something related to this last week to run the tempest-full*
> scenario tests with 2 workers concurrently rather than serially:
> 
> https://review.openstack.org/#/c/650300/
> 
> But looking at the stackviz output on that it doesn't seem to have worked at
> all, the scenario tests running at the end appear to still be running in
> serial. I don't know if that is a bug in *testr* or what - maybe mtreinish
> knows.
> 

This was a concious decision made to reduce the load during the scenario tests.
The scenario tests are run serially after all the other tests are run in
parallel. [1] The volume tests in particular were stressing the test
environments a lot ~2yrs ago so this was done to mitigate that. There are
more details in the commit message making the change:

https://review.openstack.org/#/c/439698/

(FWIW I mildly disagreed with this direction, but not enough to block it)


As for how best to determine this. We actually aggregate all the data already
in the subunit2sql db. openstack-health does provide a slowest job list
aggregated over time per job using this data:

http://status.openstack.org/openstack-health/#/job/tempest-full-py3

You just change the sort column to "Mean Runtime". I think there is a bug
in the rolling average function there because those numbers look wrong, but
it should be relative numbers.

I also had this old script on my laptop [2] which I used to get a list of
tests ordered by average speed (over the last 300 runs) filtered for those
which took > 10 seconds. I ran this just now and generated this list:

http://paste.openstack.org/show/749016/

The script is easily modifiable to change job or number of runs.
(I also think I've shared a version of it on ML before)

-Matt Treinish

[1] https://github.com/openstack/tempest/blob/master/tox.ini#L107-L109
[2] http://paste.openstack.org/show/749015/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://lists.openstack.org/pipermail/openstack-discuss/attachments/20190408/efd548ee/attachment.sig>


More information about the openstack-discuss mailing list