Hi,
I've been working on a check job that uses devstack-gate jobs to run
the nova with the docker driver, while doing this I noticed that
sometimes during the nova boot for an instance the node looses network
connectivity(obviously a problem that needs to be worked on).
Whats interesting is zuuls behavior when this occurs in the check queue.
The job simply got restarted and this kept happening until the job passed.
A legitimately failed job :
https://jenkins05.openstack.org/job/check-nova-docker-dsvm-f20/2/
http://logs.openstack.org/14/91514/5/check/check-nova-docker-dsvm-f20/d5c1ebf/console.html
Retry (also failed) :
https://jenkins07.openstack.org/job/check-nova-docker-dsvm-f20/3/
http://logs.openstack.org/14/91514/5/check/check-nova-docker-dsvm-f20/d5f26ed/console.html
Retried again (passed) :
https://jenkins01.openstack.org/job/check-nova-docker-dsvm-f20/3/
http://logs.openstack.org/14/91514/5/check/check-nova-docker-dsvm-f20/2ebfa88/console.html
And success gets reported back to gerrit
https://review.openstack.org/#/c/91514/
Patch Set 5: Verified+1
check-nova-docker-dsvm-f20 SUCCESS in 17m 27s (non-voting)
Wouldn't this behavior allow commits that cause intermittent network
problems to more easily sneak passed the gating infrastructure?
I'm guessing that the retry is being triggered in
zuul/launcher/gearman.py : onBuildCompleted()
because onDisconnect calls onBuildCompleted with no results param
Any thoughts?
thanks,
Derek.