[openstack-dev] [Nova] [Cinder] [Tempest] Regarding deleting snapshot when instance is OFF

Jordan Pittier jordan.pittier at scality.com
Tue Jun 16 13:33:00 UTC 2015


On Thu, Apr 9, 2015 at 6:10 PM, Eric Blake <eblake at redhat.com> wrote:

> On 04/08/2015 11:22 PM, Deepak Shetty wrote:
> > + [Cinder] and [Tempest] in the $subject since this affects them too
> >
> > On Thu, Apr 9, 2015 at 4:22 AM, Eric Blake <eblake at redhat.com> wrote:
> >
> >> On 04/08/2015 12:01 PM, Deepak Shetty wrote:
> >>>
> >>> Questions:
> >>>
> >>> 1) Is this a valid scenario being tested ? Some say yes, I am not sure,
> >>> since the test makes sure that instance is OFF before snap is deleted
> and
> >>> this doesn't work for fs-backed drivers as they use hyp assisted snap
> >> which
> >>> needs domain to be active.
> >>
> >> Logically, it should be possible to delete snapshots when a domain is
> >> off (qemu-img can do it, but libvirt has not yet been taught how to
> >> manage it, in part because qemu-img is not as friendly as qemu in having
> >> a re-connectible Unix socket monitor for tracking long-running
> progress).
> >>
> >
> > Is there a bug/feature already opened for this ?
>
> Libvirt has this bug: https://bugzilla.redhat.com/show_bug.cgi?id=987719
> which tracks generic ability of libvirt to delete snapshots; ideally,
> the code to manage snapshots will work for both online and persistent
> offline guests, but it may result in splitting the work into multiple bugs.
>
>
I can't access this bug report, it seems "private", I need to authenticate.


> > I didn't understand much
> > on what you
> > mean by re-connectible unix socket :)... are you hinting that qemu-img
> > doesn't have
> > ability to attach to a qemu / VM process for long time over unix socket ?
>
> For online guest control, libvirt normally creates a Unix socket, then
> starts qemu with its -qmp monitor pointing to that socket.  That way, if
> libvirtd goes away and then restarts, it can reconnect as a client to
> the existing socket file, and qemu never has to know that the person on
> the other end changed.  With that QMP monitor, libvirt can query qemu's
> current state at will, get event notifications when long-running jobs
> have finished, and issue commands to terminate long-running jobs early,
> even if it is a different libvirtd issuing a later command than the one
> that started the command.
>
> qemu-img, on the other hand, only has the -p option or SIGUSR1 signal
> for outputting progress to stderr on a long-running operation (not the
> most machine-parseable), but is not otherwise controllable.  It does not
> have a management connection through a Unix socket.  I guess in thinking
> about it a bit more, a Unix socket is not essential; as long as the old
> libvirtd starts qemu-img in a manner that tracks its pid and collects
> stderr reliably, then restarting libvirtd can send SIGUSR1 to the pid
> and track the changes to stderr to estimate how far along things are.
>
> Also, the idea has been proposed that qemu-img is not necessary; libvirt
> could use qemu -M none to create a dummy machine with no CPUs and JUST
> disk images, and then use the qemu QMP monitor as usual to perform block
> operations on those disks by reusing the code it already has working for
> online guests.  But even this approach needs coding into libvirt.
>
> --
> Eric Blake   eblake redhat com    +1-919-301-3266
> Libvirt virtualization library http://libvirt.org
>
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
Hi,
I'd like to progress on this issue, so I will spend some time on it.

Let's recap. The issue is "deleting a Cinder snapshot that was created
during an Nova Instance snapshot (booted from a cinder volume) doesn't work
when the original Nova Instance is stopped". This bug only arises when a
Cinder driver uses the feature called "QEMU Assisted
Snapshots"/live-snapshot. (currently only GlusterFS, but soon generic NFS
when https://blueprints.launchpad.net/cinder/+spec/nfs-snapshots gets in).

This issue is triggered by the Tempest scenario "test_volume_boot_pattern".
This scenario:
[does some stuff]
1) Creates a cinder volume from an Cirros Image
2) Boot a Nova Instance on the volume
3) Make a snapshot of this instance (which creates a cinder snapshot
because the instance was booted from a volume), using the feature QEMU
Assisted Snapshots
[do some other stuff]
4) stop the instance created in step 2 then delete the snapshot created in
step 3.

The deletion of snapshot created in step 3 fails because Nova wants libvirt
to do a blockRebase (see
https://github.com/openstack/nova/blob/68f6f080b2cddd3d4e97dc25a98e0c84c4979b8a/nova/virt/libvirt/driver.py#L1920
)

For reference, there's a bug targeting Cinder for this :
https://bugs.launchpad.net/cinder/+bug/1444806

What I'd like to do, but I am asking your advice first is:
Just before doing the call to virt_dom.blockRebase(), check if the domain
is running, and if not call "qemu-img rebase -b $rebase_base rebase_disk".
(this idea was brought up by Eric Blake in the previous reply).

Questions:
Is it safe to do so ?
Is it the right approach ? (given that I don't really want to wait for
libvirt to support blockRebase on offline domain)

Thanks a lot !
Jordan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20150616/1aabad6d/attachment.html>


More information about the OpenStack-dev mailing list