[openstack-dev] Thoughts on the patch test failure rate and moving forward

Daniel P. Berrange berrange at redhat.com
Mon Jul 28 12:35:45 UTC 2014


On Mon, Jul 28, 2014 at 02:28:56PM +0200, Thierry Carrez wrote:
> James E. Blair wrote:
> > [...]
> > Most of these bugs are not failures of the test system; they are real
> > bugs.  Many of them have even been in OpenStack for a long time, but are
> > only becoming visible now due to improvements in our tests.  That's not
> > much help to developers whose patches are being hit with negative test
> > results from unrelated failures.  We need to find a way to address the
> > non-deterministic bugs that are lurking in OpenStack without making it
> > easier for new bugs to creep in.
> 
> I think that's a critical point. As a community, we need to move from a
> perspective where we see the gate as a process step and failure there
> being described as "the gate is broken".
> 
> Although in some cases the failures are indeed coming from a gate bug,
> in most cases the failures are coming from a pileup of race conditions
> and other rare errors in OpenStack itself. In other words, the gate is
> not broken, *OpenStack* is broken. If you can't get the tests to pass on
> a proposed change due to test failures, that means OpenStack itself has
> reached a level where it just doesn't work. The gate is just a thermometer.
> 
> Those type of problems need to be solved, even if changes can be
> introduced in the CI/gate system to mitigate some of their most painful
> side-effects. However, currently, only a handful of developers actually
> work on fixing such issues -- and today those developers are completely
> overwhelmed and burnt out.
> 
> We need to have more people working on those bugs. We need to
> communicate this key type of strategic contribution to our corporate
> sponsors. We need to make it practical to work on those bugs, by
> providing all the tools we can to help in the debugging. We need to make
> it rewarding to work on those bugs: some of those bugs will be the most
> complex bugs you can find in OpenStack -- they should be viewed as an
> intellectual challenge for our best minds, rather than as cleaning up a
> sewer that other people continuously contribute to fill.

I recall it was suggested elsewhere recently, but I think that perhaps
we should consider having much more regular bug squashing days. eg could
say we have "bug squash wednesdays" every 2 weeks or so where we explicitly
encourage people to focus their attention exclusively on bug fixes and
ignore all feature related stuff. Core reviewers could set the tone by
not reviewing any patches which were not tagged with a bug on those days
and encouraging discussions around the bugs in IRC. The bug triage and
gate teams could help prime it by providing a couple of lists of bugs,
each list targetted to suit some skill level, to make  it easy for people
to pick off bugs to attack on those days. 

Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|



More information about the OpenStack-dev mailing list