[openstack-dev] [gate] The gate: a failure analysis

Samuel Merritt sam at swiftstack.com
Mon Jul 21 17:58:53 UTC 2014


On 7/21/14, 3:38 AM, Matthew Booth wrote:
> [snip]
>
> I would like to make the radical proposal that we stop gating on CI
> failures. We will continue to run them on every change, but only after
> the change has been successfully merged.
>
> Benefits:
> * Without rechecks, the gate will use 8 times fewer resources.
> * Log analysis is still available to indicate the emergence of races.
> * Fixes can be merged quicker.
> * Vastly less developer time spent monitoring gate failures.
>
> Costs:
> * A rare class of merge bug will make it into master.
>
> Note that the benefits above will also offset the cost of resolving this
> rare class of merge bug.

I think this is definitely a move in the right direction, but I'd like 
to propose a slight modification: let's cease blocking changes on 
*known* CI failures.

More precisely, if Elastic Recheck knows about all the failures that 
happened on a test run, treat that test run as successful.

I think this will gain virtually all the benefits you name while still 
retaining most of the gate's ability to keep breaking changes out.

As a bonus, it'll encourage people to make Elastic Recheck better. 
Currently, the easy path is to just type "recheck no bug" and click 
"submit"; it takes a lot less time than scrutinizing log files to guess 
at what went wrong. If failures identified by E-R don't block 
developers' changes, then the easy path is to improve E-R's checks, 
which benefits everyone.



More information about the OpenStack-dev mailing list