[openstack-dev] [nova] [libvirt] Debugging blockRebase() - "active block copy not ready for pivot"

Matt Riedemann mriedem at linux.vnet.ibm.com
Fri Oct 7 20:32:14 UTC 2016


On 10/6/2016 7:58 AM, Kashyap Chamarthy wrote:
> On Thu, Oct 06, 2016 at 01:32:39AM +0200, Kashyap Chamarthy wrote:
>> TL;DR
>> -----
>>
>> From the debug analysis of the log below, and discussion with Eric Blake
>> of upstream QEMU / libvirt resulted in the below bug report:
>>
>>   https://bugzilla.redhat.com/show_bug.cgi?id=1382165 --
>>   virDomainGetBlockJobInfo: Adjust job reporting based on QEMU stats & the
>>   "ready" field of `query-block-jobs`
>
> When I raised this on libvirt mailing list[0][1], one of the upstream
> libvirt devs expressed an NACK in adjusting / "deliberately reporting
> false data in block info structure".  Similar concern was also shared by
> Matt Booth on #openstack-nova IRC.
>
> Next, turns out the READY event is already exposed via the guest XML[1]:
>
> ---------------------------------------------------------------------
> We expose the state of the copy job in the XML and forward the READY
> event from qemu to the users.
>
> A running copy job exposes itself in the xml as:
>
>     <disk type='file' device='cdrom'>
>       <driver name='qemu' type='raw'/>
>       <source file='/var/lib/libvirt/images/systemrescuecd-x86-4.8.0.iso'/>
>       <backingStore/>
>       <mirror type='file' file='/tmp/ble.img' format='raw' job='copy'>
>         <format type='raw'/>
>         <source file='/tmp/ble.img'/>
>       </mirror>
>       [...]
>     </disk>
>
> While the ready copy job is exposed as:
>
>     <disk type='file' device='cdrom'>
>       <driver name='qemu' type='raw'/>
>       <source file='/var/lib/libvirt/images/systemrescuecd-x86-4.8.0.iso'/>
>       <backingStore/>
>       <mirror type='file' file='/tmp/ble.img' format='raw' job='copy' ready='yes'>
>         <format type='raw'/>
>         <source file='/tmp/ble.img'/>
>       </mirror>
>       [...]
>     </disk>
>
>
> Additionally we have anyncrhronous events that are emitted once qemu
> notifies us that the block job has reached sync state or finished.
> Libvirt uses the event to switch to the ready state.
>
> The documentation suggests that block jobs should listen to the events
> and act accordingly only after receiving the event.
> ---------------------------------------------------------------------
>
> So, Nova's is_job_complete() method & friends need to be reworked to
> listen on the events for job readiness.
>
> [0]
> https://www.redhat.com/archives/libvir-list/2016-October/msg00217.html
> [1] https://www.redhat.com/archives/libvir-list/2016-October/msg00229.html
> [2] https://bugzilla.redhat.com/show_bug.cgi?id=1382165#c3
>
>>
>> Details
>> -------
>>
>> The code in Nova that's being executed is this part in _swap_volume()
>> from libvirt/driver.py.
>>
>>     [...]
>>     # Start copy with VIR_DOMAIN_REBASE_REUSE_EXT flag to
>>     # allow writing to existing external volume file
>>     dev.rebase(new_path, copy=True, reuse_ext=True)
>>
>>     while not dev.is_job_complete():
>>         time.sleep(0.5)
>>
>>
>>     dev.abort_job(pivot=True)
>>     [...]
>>
>
> [...]
>

Thanks for the great libvirtd log analysis, that's really helpful see 
what's going on and where we fail.

I've replied in Matthew's patch, which I think we can get in now 
regardless and backport.

As for the fix, it sounds like mdbooth is going to work on the event 
listener code, which I'm hesitant to backport, but honestly this is such 
a latent broken flow that I don't think we really need to backport any 
fixes, at least for the event listener work to fix this long-term. The 
swap-volume test is disabled by default in Tempest and we enable it in 
devstack, so we can control which CI environments it runs in.

-- 

Thanks,

Matt Riedemann




More information about the OpenStack-dev mailing list