[openstack-dev] Gate breakage process - Let's fix! (related but not specific to neutron)
Maru Newby
marun at redhat.com
Fri Aug 16 23:42:23 UTC 2013
On Aug 16, 2013, at 11:44 AM, Clint Byrum <clint at fewbar.com> wrote:
> Excerpts from Maru Newby's message of 2013-08-16 11:25:07 -0700:
>> Neutron has been in and out of the gate for the better part of the past month, and it didn't slow the pace of development one bit. Most Neutron developers kept on working as if nothing was wrong, blithely merging changes with no guarantees that they weren't introducing new breakage. New bugs were indeed merged, greatly increasing the time and effort required to get Neutron back in the gate. I don't think this is sustainable, and I'd like to make a suggestion for how to minimize the impact of gate breakage.
>>
>> For the record, I don't think consistent gate breakage in one project should be allowed to hold up the development of other projects. The current approach of skipping tests or otherwise making a given job non-voting for innocent projects should continue. It is arguably worth taking the risk of relaxing gating for those innocent projects rather than halting development unnecessarily.
>>
>> However, I don't think it is a good idea to relax a broken gate for the offending project. So if a broken job/test is clearly Neutron related, it should continue to gate Neutron, effectively preventing merges until the problem is fixed. This would both raise the visibility of breakage beyond the person responsible for fixing it, and prevent additional breakage from slipping past were the gating to be relaxed.
>>
>> Thoughts?
>>
>
> I think this is a cultural problem related to the code review discussing
> from earlier in the week.
>
> We are not looking at finding a defect and reverting as a good thing where
> high fives should be shared all around. Instead, "you broke the gate"
> seems to mean "you are a bad developer". I have been a bad actor here too,
> getting frustrated with the gate-breaker and saying the wrong thing.
>
> The problem really is "you _broke_ the gate". It should be "the gate has
> found a defect, hooray!". It doesn't matter what causes the gate to stop,
> it is _always_ a defect. Now, it is possible the defect is in tempest,
> or jenkins, or HP/Rackspace's clouds where the tests run. But it is
> always a defect that what worked before, does not work now.
>
> Defects are to be expected. None of us can write perfect code. We should
> be happy to revert commits and go forward with an enabled gate while
> the team responsible for the commit gathers information and works to
> correct the issue.
You're preaching to the choir, and I suspect that anyone with an interest in software quality is likely to prefer problem solving to finger pointing. However, my intent with this thread was not to promote more constructive thinking about defect detection. Rather, I was hoping to communicate a flaw in the existing process and seek consensus on how that process could best be modified to minimize the cost of resolving gate breakage.
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
More information about the OpenStack-dev
mailing list