[openstack-dev] Gate breakage process - Let's fix! (related but not specific to neutron)

Joe Gordon joe.gordon0 at gmail.com
Sat Aug 17 12:16:56 UTC 2013


On Aug 17, 2013 7:52 AM, "Salvatore Orlando" <sorlando at nicira.com> wrote:
>
> I tend to agree that when the gate for a project is broken, nothing
should be merged for that project until the gate jobs are green again.
> In the case of Neutron, making the job non voting only caused more bugs
to slip through, and that meant more works for the developer themselves,
and more headaches for developers of other projects relying on it.
>
> When dealing with intermittent failures, like the bug which probably
started the issues we've been witnessing in the past 3 weeks, I think it
might a sensible idea to make the job non-voting only for projects which
surely can't be the cause of the gate failure; or perhaps skip the
offending test only.
>
> This means however asymettrical gating, and from Monty's post it seems
there's something quite wrong with it. However, due to my lack of expertise
on the subject, I am unable to see the issue with it.

Although not as simple this can also be done by telling neutron-core to
only merge fixes that will move things closer to gating tests staying green.

We all use a similar process for feature freeze already.

>
> Salvatore
>
>
>
>
> On 17 August 2013 01:42, Maru Newby <marun at redhat.com> wrote:
>>
>>
>> On Aug 16, 2013, at 11:44 AM, Clint Byrum <clint at fewbar.com> wrote:
>>
>> > Excerpts from Maru Newby's message of 2013-08-16 11:25:07 -0700:
>> >> Neutron has been in and out of the gate for the better part of the
past month, and it didn't slow the pace of development one bit.  Most
Neutron developers kept on working as if nothing was wrong, blithely
merging changes with no guarantees that they weren't introducing new
breakage.  New bugs were indeed merged, greatly increasing the time and
effort required to get Neutron back in the gate.  I don't think this is
sustainable, and I'd like to make a suggestion for how to minimize the
impact of gate breakage.
>> >>
>> >> For the record, I don't think consistent gate breakage in one project
should be allowed to hold up the development of other projects.  The
current approach of skipping tests or otherwise making a given job
non-voting for innocent projects should continue.  It is arguably worth
taking the risk of relaxing gating for those innocent projects rather than
halting development unnecessarily.
>> >>
>> >> However, I don't think it is a good idea to relax a broken gate for
the offending project.  So if a broken job/test is clearly Neutron related,
it should continue to gate Neutron, effectively preventing merges until the
problem is fixed.  This would both raise the visibility of breakage beyond
the person responsible for fixing it, and prevent additional breakage from
slipping past were the gating to be relaxed.
>> >>
>> >> Thoughts?
>> >>
>> >
>> > I think this is a cultural problem related to the code review
discussing
>> > from earlier in the week.
>> >
>> > We are not looking at finding a defect and reverting as a good thing
where
>> > high fives should be shared all around. Instead, "you broke the gate"
>> > seems to mean "you are a bad developer". I have been a bad actor here
too,
>> > getting frustrated with the gate-breaker and saying the wrong thing.
>> >
>> > The problem really is "you _broke_ the gate". It should be "the gate
has
>> > found a defect, hooray!". It doesn't matter what causes the gate to
stop,
>> > it is _always_ a defect. Now, it is possible the defect is in tempest,
>> > or jenkins, or HP/Rackspace's clouds where the tests run. But it is
>> > always a defect that what worked before, does not work now.
>> >
>> > Defects are to be expected. None of us can write perfect code. We
should
>> > be happy to revert commits and go forward with an enabled gate while
>> > the team responsible for the commit gathers information and works to
>> > correct the issue.
>>
>> You're preaching to the choir, and I suspect that anyone with an
interest in software quality is likely to prefer problem solving to finger
pointing.  However, my intent with this thread was not to promote more
constructive thinking about defect detection.  Rather, I was hoping to
communicate a flaw in the existing process and seek consensus on how that
process could best be modified to minimize the cost of resolving gate
breakage.
>>
>>
>> > _______________________________________________
>> > OpenStack-dev mailing list
>> > OpenStack-dev at lists.openstack.org
>> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>>
>> _______________________________________________
>> OpenStack-dev mailing list
>> OpenStack-dev at lists.openstack.org
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20130817/cf8b25b7/attachment.html>


More information about the OpenStack-dev mailing list