[openstack-dev] [nova] [libvirt] Debugging blockRebase() - "active block copy not ready for pivot"
Kashyap Chamarthy
kchamart at redhat.com
Wed Oct 5 23:32:39 UTC 2016
TL;DR
-----
>From the debug analysis of the log below, and discussion with Eric Blake
of upstream QEMU / libvirt resulted in the below bug report:
https://bugzilla.redhat.com/show_bug.cgi?id=1382165 --
virDomainGetBlockJobInfo: Adjust job reporting based on QEMU stats & the
"ready" field of `query-block-jobs`
Until the above is fixed, I think _swap_volume() method in Nova could
either:
(a) just try to either wait until it sees the BLOCK_JOB_READY event,
which is emitted when a 'drive-mirror' job.
(b) Or keep polling, until we see "ready": true field (in the response
of `query-block-jobs` inviked via libvirt blockJobInfo()
Details
-------
The code in Nova that's being executed is this part in _swap_volume()
from libvirt/driver.py.
[...]
# Start copy with VIR_DOMAIN_REBASE_REUSE_EXT flag to
# allow writing to existing external volume file
dev.rebase(new_path, copy=True, reuse_ext=True)
while not dev.is_job_complete():
time.sleep(0.5)
dev.abort_job(pivot=True)
[...]
(Optional Reading) libvirt / QEMU traffic log analysis
------------------------------------------------------
Libvirt debug log (NB: 61MB)
http://logs.openstack.org/73/374373/2/check/gate-tempest-dsvm-full-ubuntu-xenial/149fe3e/logs/libvirt/libvirtd.txt.gz
Failed test: tempest.api.compute.admin.test_volume_swap.TestVolumeSwap.test_volume_swap
(I line-wrapped the logs manually.)
(1) We see _swap_volume() invokes the Libvirt blockRebase(), which calls
'drive-mirror' (ID: libvirt-25)
-----
2016-10-04 15:54:45.308+0000: 30554: debug : qemuMonitorJSONCommandWithFd:291 :
Send command
'{"execute":"drive-mirror","arguments":{"device":"drive-virtio-disk1","target":"/dev/disk/by-path
/ip-10.205.221.168:3260-iscsi-iqn.2010-10.org.openstack:volume-7b86913d-78e4-4b68-8df2-63f1a4c6656a-lun-1","sync":"full","mode":"existing","format":"raw"},"id":"libvirt-25"}'
for write with FD -1
-----
(2.a) Then, libvirt queries QEMU for job status (ID: 'libvirt-26') via
`query-block-jobs` command:
-----
2016-10-04 15:54:45.332+0000: 30550: info : qemuMonitorIOWrite:528 :
QEMU_MONITOR_IO_WRITE: mon=0x7fa43000ee60
buf={"execute":"query-block-jobs","id":"libvirt-26"}
-----
(2.b) Response for libvirt-26. (Here we see: "len" == 1073741824,
"offset" == 0. The copy job has just begun).
-----
2016-10-04 15:54:45.332+0000: 30550: info :
qemuMonitorJSONIOProcessLine:206 : QEMU_MONITOR_RECV_REPLY:
mon=0x7fa43000ee60 reply={"return": [{"io-status": "ok", "device":
"drive-virtio-disk1", "busy": true, "len": 1073741824, "offset": 0,
"paused": false, "speed": 0, "ready": false, "type": "mirror"}], "id":
"libvirt-26"}
-----
[Libvirt keeps on querying for job status, via `query-block-jobs`]
[Snip three polling requests 27, 28, 29]
(3.a) Next query (ID: libvirt-30):
-----
2016-10-04 15:54:50.060+0000: 30550: info : qemuMonitorIOWrite:528 :
QEMU_MONITOR_IO_WRITE: mon=0x7fa43000ee60
buf={"execute":"query-block-jobs","id":"libvirt-30"}
-----
(3.b) Response for libvirt-30. (Here we see: "len" == 1073741824,
"offset" == 734003200 in bytes. Copy operation is still in progress.)
-----
2016-10-04 15:54:50.061+0000: 30550: info :
qemuMonitorJSONIOProcessLine:206 : QEMU_MONITOR_RECV_REPLY:
mon=0x7fa43000ee60 reply={"return": [{"io-status": "ok", "device":
"drive-virtio-disk1", "busy": true, "len": 1073741824, "offset":
734003200, "paused": false, "speed": 0, "ready": false, "type":
"mirror"}], "id": "libvirt-30"}
-----
[Snip four (IDs, 31, 32, 33, 34) more job status request / response
round-trips.]
(4.a) Query for job status again (ID: libvirt-35):
-----
2016-10-04 15:54:50.060+0000: 30550: info : qemuMonitorIOWrite:528 :
QEMU_MONITOR_IO_WRITE: mon=0x7fa43000ee60
buf={"execute":"query-block-jobs","id":"libvirt-30"}
-----
(4.b) In response, finally, we see len (1073741824) == offset (1073741824):
-----
2016-10-04 15:54:52.076+0000: 30550: info :
qemuMonitorJSONIOProcessLine:206 : QEMU_MONITOR_RECV_REPLY:
mon=0x7fa43000ee60 reply={"return": [{"io-status": "ok", "device":
"drive-virtio-disk1", "busy": true, "len": 1073741824, "offset":
1073741824, "paused": false, "speed": 0, "ready": false, "type":
"mirror"}], "id": "libvirt-35"}
-----
Above, the "ready" flag does not yet indicate 'true', which signals the
job has _really_ ended.
Drilling down further....
(5) The abort + pivot failed (as Nova calls
"dev.abort_job(pivot=True)"):
-------
2016-10-04 15:54:52.076+0000: 30555: debug :
qemuDomainObjExitMonitorInternal:1923 : Exited monitor (mon=0x7fa43000ee60
vm=0x7fa414002a70 name=instance-00000008)
2016-10-04 15:54:52.076+0000: 30555: error : qemuDomainBlockPivot:16137 : block
copy still active: disk 'vdb' not ready for pivot yet
2016-10-04 15:54:52.076+0000: 30555: debug : qemuBlockJobSyncEnd:242 : disk=vdb
2016-10-04 15:54:52.076+0000: 30555: debug : qemuDomainObjEndJob:1831 :
Stopping job: modify (async=none vm=0x7fa414002a70 name=instance-00000008)
-------
(6) HOWEVER, *here* we see the event BLOCK_JOB_READY, indicating the job
is now ready for pivot. Unfortunately, this turns out to be too late --
as libvirt was asked to pivot too early in step (5) above.
-------
2016-10-04 15:54:52.333+0000: 30550: info : qemuMonitorIOProcess:423 :
QEMU_MONITOR_IO_PROCESS: mon=0x7fa43000ee60 buf={"timestamp":
{"seconds": 1475596492, "microseconds": 333414}, "event":
"BLOCK_JOB_READY", "data": {"device": "drive-virtio-disk1", "len":
1073741824, "offset": 1073741824, "speed": 0, "type": "mirror"}}
-------
--
/kashyap
More information about the OpenStack-dev
mailing list