On 06/05/14 16:17, Sean Dague wrote: > On 05/06/2014 10:52 AM, Derek Higgins wrote: >> Hi, >> >> I've been working on a check job that uses devstack-gate jobs to run >> the nova with the docker driver, while doing this I noticed that >> sometimes during the nova boot for an instance the node looses network >> connectivity(obviously a problem that needs to be worked on). >> Whats interesting is zuuls behavior when this occurs in the check queue. >> The job simply got restarted and this kept happening until the job passed. >> >> A legitimately failed job : >> https://jenkins05.openstack.org/job/check-nova-docker-dsvm-f20/2/ >> >> http://logs.openstack.org/14/91514/5/check/check-nova-docker-dsvm-f20/d5c1ebf/console.html >> >> Retry (also failed) : >> https://jenkins07.openstack.org/job/check-nova-docker-dsvm-f20/3/ >> >> http://logs.openstack.org/14/91514/5/check/check-nova-docker-dsvm-f20/d5f26ed/console.html >> >> Retried again (passed) : >> https://jenkins01.openstack.org/job/check-nova-docker-dsvm-f20/3/ >> >> http://logs.openstack.org/14/91514/5/check/check-nova-docker-dsvm-f20/2ebfa88/console.html >> >> And success gets reported back to gerrit >> https://review.openstack.org/#/c/91514/ >> Patch Set 5: Verified+1 >> check-nova-docker-dsvm-f20 SUCCESS in 17m 27s (non-voting) >> >> >> Wouldn't this behavior allow commits that cause intermittent network >> problems to more easily sneak passed the gating infrastructure? >> >> >> I'm guessing that the retry is being triggered in >> zuul/launcher/gearman.py : onBuildCompleted() >> >> because onDisconnect calls onBuildCompleted with no results param >> >> Any thoughts? > > There is some automatic retry facility in zuul right now to deal with a > set of issues which are considered recoverable and typically the fault > of the infrastructure provider. > > There might be a way to slip something through, however, all failures in > the gate do tend to get eyes on them, and I've yet to see this kind of > issue slip through. So something to keep an eye out for. Would be Hasn't this problem already slipped through (although its in the check queue not the gate), I mean it can now be merged and was only noticed because I was watching the zuul status page while the jobs were running? > curious to see if we can mine out these issues in elastic recheck. The > failed results are still reported to logstash from what I can see, so we > can track them. I'll see if I can find any similar occurrences in other jobs and report back. > > -Sean > > > > _______________________________________________ > OpenStack-dev mailing list > OpenStack-dev at lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >