[openstack-dev] [infra] Intermittent network problems allowed to sneak passed the gate?

Derek Higgins derekh at redhat.com
Wed May 7 08:41:27 UTC 2014


On 06/05/14 23:55, Jeremy Stanley wrote:
> On 2014-05-06 15:52:04 +0100 (+0100), Derek Higgins wrote:
> [...]
>> The job simply got restarted and this kept happening until the job passed.
>>
>> A legitimately failed job :
>>   https://jenkins05.openstack.org/job/check-nova-docker-dsvm-f20/2/
>>
>> http://logs.openstack.org/14/91514/5/check/check-nova-docker-dsvm-f20/d5c1ebf/console.html
> [...]
> 
> If the job fails in such a way that it impacts communication between
> the slave and the Jenkins master, or tanks the slave so badly that
> it ceases responding entirely, Jenkins often does not report a build
> completion status. Because this happens rather unfortunately often
> due to the nature of connectivity in service providers and due to
> bugs in Jenkins, Zuul assumes it should automatically reattempt any
> job which ceases running without explanation.
> 
> Perhaps one option would be to keep a retry counter and not
> reattempt a job which fails in this manner more than once or
> twice...?

It won't catch all cases but sounds like a good idea to me, if there is
somebody familiar with the zuul code that can quickly do it great,
otherwise I can try and make myself familiar.

Derek.



More information about the OpenStack-dev mailing list