[openstack-dev] libvirt race in openstack ci gate

Sean Dague sean at dague.net
Mon May 20 18:54:46 UTC 2013


This came up in -qa when some folks were trying to debug a Quantum patch 
that was failing, but seemingly unrelated - 
http://logs.openstack.org/29184/8/gate/gate-tempest-devstack-vm-quantum/23537/

It looks like there is a race in nova-compute around trying to spin up 
guests around a libvirt fail/race.

http://logs.openstack.org/29184/8/gate/gate-tempest-devstack-vm-quantum/23537/logs/screen-n-cpu.txt.gz

The critical part is:

2013-05-20 15:47:13.461 DEBUG nova.openstack.common.rpc.amqp 
[req-ddd6a6f2-7a52-49d3-8545-9e035aeb0134 demo demo] UNIQUE_ID is 
3e22c5ab4a264091b3a53572a4e5c518. _add_unique_id 
/opt/stack/new/nova/nova/openstack/common/rpc/amqp.py:337
2013-05-20 15:47:13.480 DEBUG nova.openstack.common.lockutils 
[req-ddd6a6f2-7a52-49d3-8545-9e035aeb0134 demo demo] Got semaphore 
"3e9b1297-caf1-4daf-8127-919b8ba68fc4" for method "do_run_instance"... 
inner /opt/stack/new/nova/nova/openstack/common/lockutils.py:190

libvir: QEMU error : Domain not found: no domain with matching name 
'instance-0000000b'

2013-05-20 15:47:13.485 AUDIT nova.compute.manager 
[req-ddd6a6f2-7a52-49d3-8545-9e035aeb0134 demo demo] [instance: 
3e9b1297-caf1-4daf-8127-919b8ba68fc4] Starting instance...
2013-05-20 15:47:13.485 DEBUG nova.openstack.common.rpc.amqp 
[req-ddd6a6f2-7a52-49d3-8545-9e035aeb0134 demo demo] Making synchronous 
call on conductor ... multicall 
/opt/stack/new/nova/nova/openstack/common/rpc/amqp.py:586

(that libvir: line which I put the line breaks around to highlight)

After that happens the guest is left in a BUILD state, we never check 
back in with livirt, that causes a timeout while we wait for the guest 
to go to ACTIVE, which then causes a fail on Tempest.

I remember seeing these issues previously, but not for a while. Any 
libvirt experts willing to weigh in on this?

	-Sean

-- 
Sean Dague
http://dague.net



More information about the OpenStack-dev mailing list