Joe Gordon joe.gordon0 at gmail.com
Thu Sep 26 00:56:15 UTC 2013

Hi All,

TL;DR: We will be automatically identifying your flaky tempest runs, so you
just have to confirm that you hit bug x, not identify which bug you hit.

http://status.openstack.org/rechecks/ is a great tool to identify which
bugs are causing our gating to be flaky allowing for better prioritization
of bug fixing. But as many of you have noticed hunting down which bug to
use for your recheck can be tedious, and using 'recheck no bug' just kicks
the problem down the road for someone else to deal with.

To address this issue, Matthew Treinish, Clark Boylan, myself, and others
have started elastic-recheck [
https://github.com/openstack-infra/elastic-recheck] to classify
tempest-devstack failures using ElasticSearch [http://logstash.openstack.org].
 When we hit a new bug, we use http://logstash.openstack.org to manually
find an ElasticSearch fingerprint for it
 And every time we see a new tempst-devstack failure we try to classify it,
and report back to review.openstack.org so the patch author can confirm
that was the bug they saw and run a recheck.

We are in the middle of rolling this out, and you can expect to see
elastic-recheck commenting on your failed tempest jobs in the next few days.


* Identifying which bugs are frequent is only the first step, we still need
to fix them.  Otherwise tempest will stay flaky.  We have about a 25%
failure rate in the gate pipeline, as of the most recent numbers.
* ElasticSearch is currently slow, and although we are fixing that, it may
take a few hours before elastic-recheck can classify your failures.
* We have more work to do on this, so help welcome!
