[openstack-dev] Unwedging the gate

Clint Byrum clint at fewbar.com
Mon Nov 25 09:23:44 UTC 2013

Excerpts from Joe Gordon's message of 2013-11-24 21:00:58 -0800:
> Hi All,
> TL;DR Last week the gate got wedged on nondeterministic failures. Unwedging
> the gate required drastic actions to fix bugs.

(great write-up, thank you for the details, and thank you for fixing

> Now that we have the gate back into working order, we are working on the
> next steps to prevent this from happening again.  The two most immediate
> changes are:
>    - Doing a better job of triaging gate bugs  (
>    http://lists.openstack.org/pipermail/openstack-dev/2013-November/020048.html
>     ).
>    - In the next few days we will remove  'reverify no bug' (although you
>    will still be able to run 'reverify bug x'.

I am curious, why not also disable 'recheck no bug'?

I see this as a failure of bug triage. A bug that has more than 1
recheck/reverify attached to it is worth a developer's time. The data
gathered through so many test runs is invaluable when chasing races like
the ones that cause these intermittent failures. If every core dev of
every project spent 10 working minutes every day looking at the rechecks
page to see if there is an untriaged recheck there, or just triaging bugs
in general, I suspect we'd fix these a lot quicker.

I do wonder if we would be able to commit enough resources to just run
two copies of the gate in parallel each time and require both to pass.
Doubling the odds* that we will catch an intermittent failure seems like
something that might be worth doubling the compute resources used by
the gate.

*I suck at math. Probably isn't doubling the odds. Sounds
good though. ;)

More information about the OpenStack-dev mailing list