[OpenStack-Infra] Jenkins job may run on same node in CI environment.

Guo Qing GH Hu hguoqing at cn.ibm.com
Tue Feb 11 08:47:26 UTC 2014


Dear all,

I found that in following scenario jenkins job may run on same node in CI 
environment.


When nodepool is trying to delete the node 1 on jenkins, but the node has 
been assigned to queued 2, so nodepool will be failed to delete node 1.

2014-01-21 03:13:16,520 DEBUG nodepool.NodeUpdateListener: Received: 
onFinalized 
{"name":"gate-ci-devstack-test","url":"job/gate-ci-devstack-test/","build":{"full_url":"
https://172.16.2.115/job/gate-ci-devstack-test/3815/
","number":3815,"phase":"FINISHED","status":"FAILURE","url":"job/gate-ci-devstack-test/3815/","parameters":{"BASE_LOG_PATH":"70/61470/3/check","LOG_PATH":"70/61470/3/check/gate-ci-devstack-test/739f893","ZUUL_BRANCH":"master","ZUUL_CHANGE":"61470","ZUUL_CHANGE_IDS":"61470,3","ZUUL_CHANGES":"openstack/nova:master:refs/changes/70/61470/3","ZUUL_COMMIT":"956132e8df4377e66d5b78b5b9864c7da37c6bde","ZUUL_PATCHSET":"3","ZUUL_PIPELINE":"check","ZUUL_PROJECT":"openstack/nova","ZUUL_REF":"refs/zuul/master/Z1473cd61831a445792d06152612ce7f9","ZUUL_URL":"
http://172.16.2.118/p
","ZUUL_UUID":"739f893379b84a64a22ea4db1721f7e7"},"node_name":"
devstack-precise-check-v1-gemini-cdl-7323"}}
2014-01-21 03:13:16,544 DEBUG nodepool.NodeUpdateListener: Received: 
onStarted 
{"name":"gate-ci-devstack-test","url":"job/gate-ci-devstack-test/","build":{"full_url":"
https://172.16.2.115/job/gate-ci-devstack-test/3823/
","number":3823,"phase":"STARTED","url":"job/gate-ci-devstack-test/3823/","parameters":{"BASE_LOG_PATH":"61/43061/7/check","LOG_PATH":"61/43061/7/check/gate-ci-devstack-test/cf9f514","ZUUL_BRANCH":"master","ZUUL_CHANGE":"43061","ZUUL_CHANGE_IDS":"43061,7","ZUUL_CHANGES":"openstack/nova:master:refs/changes/61/43061/7","ZUUL_COMMIT":"a6cd36551b778be3903eb552c22338e16708ed6e","ZUUL_PATCHSET":"7","ZUUL_PIPELINE":"check","ZUUL_PROJECT":"openstack/nova","ZUUL_REF":"refs/zuul/master/Z8721bb3c40d743d882b4c18ff896a079","ZUUL_URL":"
http://172.16.2.118/p
","ZUUL_UUID":"cf9f514c02a04da8a83ba2222dc4bebd"},"node_name":"
devstack-precise-check-v1-gemini-cdl-7323"}}
2014-01-21 03:13:16,551 INFO nodepool.NodeUpdateListener: Setting node id: 
7323 to USED
2014-01-21 03:13:16,557 DEBUG nodepool.JenkinsManager: Manager jenkins01 
running task <nodepool.jenkins_manager.NodeExistsTask object at 
0x7faa14315090>
2014-01-21 03:13:17,576 DEBUG nodepool.JenkinsManager: Manager jenkins01 
running task <nodepool.jenkins_manager.DeleteNodeTask object at 
0x7faa10095e90>
2014-01-21 03:13:17,883 ERROR nodepool.NodeCompleteThread: Exception 
handling event for devstack-precise-check-v1-gemini-cdl-7323:
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/nodepool/nodepool.py", line 
65, in run
    self.handleEvent(session)
  File "/usr/local/lib/python2.7/dist-packages/nodepool/nodepool.py", line 
101, in handleEvent
    self.nodepool.deleteNode(session, node)
  File "/usr/local/lib/python2.7/dist-packages/nodepool/nodepool.py", line 
1032, in deleteNode
    jenkins.deleteNode(jenkins_name)
  File 
"/usr/local/lib/python2.7/dist-packages/nodepool/jenkins_manager.py", line 
118, in deleteNode
    return self.submitTask(DeleteNodeTask(name=name))
  File "/usr/local/lib/python2.7/dist-packages/nodepool/task_manager.py", 
line 90, in submitTask
    return task.wait()
  File "/usr/local/lib/python2.7/dist-packages/nodepool/task_manager.py", 
line 51, in run
    self.done(self.main(client))
  File 
"/usr/local/lib/python2.7/dist-packages/nodepool/jenkins_manager.py", line 
64, in main
    return jenkins.delete_node(self.args['name'])
  File 
"/usr/local/lib/python2.7/dist-packages/python_jenkins-0.2.1-py2.7.egg/jenkins/__init__.py", 
line 508, in delete_node
    raise JenkinsException('delete[%s] failed' % (name))
JenkinsException: delete[devstack-precise-check-v1-gemini-cdl-7323] failed

This error often occurs when all nodes are busy for zuul queue.
Changing following codes to no delay in  nodepool.py  can reduce the 
probability:
        time.sleep(DELETE_DELAY)
        self.nodepool.deleteNode(session, node)
But as you see the above log, the probability still exists, how to resolve 
it thoroughly ? Welcome your opinions!

Thanks & Best Regards,

Godwin Hu(胡国清)

Software Engineer
IBM China System and Technology Lab(CSTL), Beijing
Tel(Seat): 86-010-82451453
Location : Ring Building, 1BW270
E-mail Address: hguoqing at cn.ibm.com 
Address: IBM ZGC Campus. Ring Building, 28# ZhongGuanCun Software Park, 
Shang Di, Beijing P.R.China 100193
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-infra/attachments/20140211/1eb52170/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/gif
Size: 5192 bytes
Desc: not available
URL: <http://lists.openstack.org/pipermail/openstack-infra/attachments/20140211/1eb52170/attachment-0001.gif>


More information about the OpenStack-Infra mailing list