[openstack-dev] [infra] Intermittent network problems allowed to sneak passed the gate?

Derek Higgins derekh at redhat.com
Tue May 6 14:52:04 UTC 2014


Hi,

    I've been working on a check job that uses devstack-gate jobs to run
the nova with the docker driver, while doing this I noticed that
sometimes during the nova boot for an instance the node looses network
connectivity(obviously a problem that needs to be worked on).
Whats interesting is zuuls behavior when this occurs in the check queue.
The job simply got restarted and this kept happening until the job passed.

A legitimately failed job :
  https://jenkins05.openstack.org/job/check-nova-docker-dsvm-f20/2/

http://logs.openstack.org/14/91514/5/check/check-nova-docker-dsvm-f20/d5c1ebf/console.html

Retry (also failed)      :
  https://jenkins07.openstack.org/job/check-nova-docker-dsvm-f20/3/

http://logs.openstack.org/14/91514/5/check/check-nova-docker-dsvm-f20/d5f26ed/console.html

Retried again (passed)   :
  https://jenkins01.openstack.org/job/check-nova-docker-dsvm-f20/3/

http://logs.openstack.org/14/91514/5/check/check-nova-docker-dsvm-f20/2ebfa88/console.html

And success gets reported back to gerrit
https://review.openstack.org/#/c/91514/
Patch Set 5: Verified+1
    check-nova-docker-dsvm-f20 SUCCESS in 17m 27s (non-voting)


Wouldn't this behavior allow commits that cause intermittent network
problems to more easily sneak passed the gating infrastructure?


I'm guessing that the retry is being triggered in
zuul/launcher/gearman.py : onBuildCompleted()

because onDisconnect calls onBuildCompleted with no results param

Any thoughts?

thanks,
Derek.



More information about the OpenStack-dev mailing list