[openstack-dev] [Nova] Regarding deleting snapshot when instance is OFF

Kashyap Chamarthy kchamart at redhat.com
Thu Apr 9 07:39:30 UTC 2015


On Wed, Apr 08, 2015 at 11:31:40PM +0530, Deepak Shetty wrote:
> Hi,
>     Cinder w/ GlusterFS backend is hitting the below error as part of
> test_volume_boot_pattern tempest testcase

[Meta comment: Since main components that are triggering this errors are
Cinder with GlusterFS, adding "Cinder" tag would be useful to raise the
right folks' attention.]

> (at the end of testcase when it deletes the snap)
> 
> "/usr/local/
> 
> lib/python2.7/dist-packages/libvirt.py", line 792, in blockRebase
> 2015-04-08 07:22:44.376 32701 TRACE nova.virt.libvirt.driver if ret == -1:
> raise libvirtError ('virDomainBlockRebase() failed', dom=self)
> 2015-04-08 07:22:44.376 32701 TRACE nova.virt.libvirt.driver
> libvirtError: *Requested
> operation is not valid: domain is not running*
> 2015-04-08 07:22:44.376 32701 TRACE nova.virt.libvirt.driver

You'll likely see more details in libvirt's log why virDomainBlockRebase
fails. If you hit this failure on any of the recent Gate runs, then the
libvirt debug logs (now enabled by default) might give some clue.

Also, it would be useful if you can reproduce this issue outside of
Tempest (and its timing issues). Even better, if you can reproduce this
failure w/ just plain Cinder (or even w/o Cinder) to isolate the issue.

> More details in the LP bug [1]

The details in the bug does not provide a reproducer. As always,
providing a crystal clear reproducer (e.g. a script, or sequence of
`virsh`/libvirt API calls or exact Nova/Cinder commands) leading to the
failure will allow people to take a look at the bug much more quicker
Instead of leaving the "Burden of Proof" on the bug triagers to have a
reproducer.

> In looking closely at the testcase, it waits for the Instance to turn
> OFF post which the cleanup starts which tried to delete the snap, but
> since the cinder volume is attached state (in-use) it lets nova take
> control of the snap del operation, and nova fails as it cannot do
> blockRebase as domain is offline.


blockRebase (in short, it populates a disk image with data from its
:backing image chain, and can act on different flags you provide to it)
cannot operate on an offline image (nor on a persistent libvirt domain,
but Nova deals with it by temporarily undefining it and later redefining
it). So, first you might want to figure out why the guest is offline
before blockRebase call is invoked to get an understanding of your
questions below.

> Questions:
> 
> 1) Is this a valid scenario being tested ? Some say yes, I am not
> sure, since the test makes sure that instance is OFF before snap is
> deleted and this doesn't work for fs-backed drivers as they use hyp
> assisted snap which needs domain to be active.
> 
> 
> 2) If this is valid scenario, then it means libvirt.py in nova should
> be modified NOT to raise error, but continue with the snap delete (as
> if volume was not attached) and take care of the dom xml (so that
> domain is still bootable post snap deletion), is this the way to go ?
> 
> 
> Appreciate suggestions/comments


-- 
/kashyap



More information about the OpenStack-dev mailing list