[openstack-dev] [cinder] 3rd Party CI failures ignored, caused driver to break

Erlon Cruz sombrafam at gmail.com
Tue Mar 10 14:27:35 UTC 2015


Agreed, CI systems are not reliable. Most of failures are related to
mis-configuration or devstack problems, not driver problems itself. What
happen then, is that people just don't care if there's a red FAILURE in the
CIs results. A 4) option would be to rate CIs according to their
trustfulness (may be a composition counter of uptime and false negatives),
then the developers would pay more attention if they broke a reliable  CI.
Also, this rating could be used as parameter to decide if a CI should vote
or not.

On Thu, Feb 26, 2015 at 5:24 PM, John Griffith <john.griffith8 at gmail.com>
wrote:

>
>
> On Thu, Feb 26, 2015 at 1:10 PM, Walter A. Boring IV <walter.boring at hp.com
> > wrote:
>
>> Hey folks,
>>    Today we found out that a patch[1] that was made against our lefthand
>> driver caused the driver to fail.   The 3rd party CI that we have setup
>> to test our driver (hp_lefthand_rest_proxy.py) caught the CI failure and
>> reported it correctly[2].  The patch that broke the driver was reviewed and
>> approved without a mention of 3rd Party CI failures.
>>
>> This is a perfect example of 3rd Party CI working perfectly and catching
>> a failure, and being completely ignored by everyone
>> involved in the review process for the patch
>>
>> I know that 3rd party CI isn't perfect, and has been ripe with false
>> failures, which is one of the reasons why they aren't voting today.
>> But, that being said, if patch submitters aren't even looking at the
>> failures for CI when they are touching drivers that they don't maintain,
>> and reviewers
>> aren't looking at the CI failures, then why are we even doing 3rd party
>> CI?
>>
>> Our team is partially responsible for not seeing the failure as well.  We
>> should be watching the CI failures closely, but we are doing the best we
>> can.  There are enough patches for Cinder ongoing at any one time, that
>> we simply can't watch every single one of them for failures. We did
>> eventually
>> see that every single patchset in gerrit was now failing against our
>> driver, and this is how we caught it.  Yes, it was a bit after the fact,
>> but we did notice
>> it and now have a new patch up that fixes it.   So, in that regard 3rd
>> party CI did eventually vet out a problem that our team caught.
>>
>> How can we prevent this in the future?
>> 1) Make 3rd party CI voting.  I don't believe this is ready yet.
>>
>
> ​Agreed, look at the history on that driver (and many others) and you'll
> see we are in no way ready for that.​
>
>
> 2) Authors and reviews need to look at 3rd party CI failures when a patch
>> touches a driver.  If a failure is seen, contact the CI maintainer and work
>> with them and
>> see if the failure is related to the patch, if it's not obvious. In this
>> case, the failure was obvious.  The import changed, and now the package
>> can't find the module.
>>
>
> ​If things were more "stable" yeah, I might, but the reality is as I've
> pointed out we have a serious signal to noise ratio problem IMO
>>
>
>> 3) CI maintainers watch every single patchset and report -1's on
>> reviews?  (ouch)
>> 4) ?
>>
>
> ​Option 4 in my opinion is exactly what I've been doing since this
> started.  I receive a notification for any change that my CI setup fails
> on, and then it's up to me to go and verify if something is truly broken or
> if it's my system that's messed up.  It's not perfect and it's not really a
> true CI but it is a continuous test system which for me is what's most
> important here.  I completely understand that's not the case for other
> things.
>
> This is where the churn of typo fixes, hacking changes etc can bight us.
> Sure, while reviewing that probably could've/should've been caught.  The
> problem is for me at least if we're going to do this sorts of semantic
> changes that touch so many files I'm likely to be half asleep before I get
> to the "cinder/volume" section.  Doesn't make it right, but kinda how it is.
>
> Anyway, yeah it sucks, but I'd argue this worked out GREAT.  A change was
> made that broke things in LHN driver, but historically nobody would've
> known until a customer tried to use it.  In this case, the problem was
> found in less than a day and fixed.  That's a pretty huge win in my opinion!
>
> Thanks,
> John​
>
>
>>
>>
>>
>> Here is the patch that broke the lefthand driver[1]
>> Here is the reported failure in the c-vol log for the patch by our 3rd
>> party CI system[2]
>> Here is my new patch that fixes the lefthand driver again.[3]
>>
>> [1] https://review.openstack.org/#/c/145780/
>> [2] http://15.126.198.151/80/145780/15/check/lefthand-
>> iscsi-driver-master-client-pip-dsvm/3927e3d/logs/screen-
>> c-vol.txt.gz?level=ERROR
>> [3] https://review.openstack.org/#/c/159586/
>>
>> $0.02
>> Walt
>>
>> ____________________________________________________________
>> ______________
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:
>> unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20150310/d151e1b2/attachment.html>


More information about the OpenStack-dev mailing list