[openstack-dev] Libvirt snapshot process optimization

Rafi Khardalian rafi at metacloud.com
Tue Aug 28 09:09:14 UTC 2012


I had a couple different ideas on how to approach this given the
constraints:

1. Keep the snapshots around until the next operation which restarts the VM
process.  This code would run prior to (re)starting a Qemu/KVM process such
as hard reboot, resume on host boot, another snapshot, etc.  It would look
at all existing snapshots in a given qcow2 file and remove them prior to
executing said operation.  This would limit the number of snapshots in the
image itself to 1, which I would take as a fair trade off to make snapshots
only minimally disruptive.

2. Suspend the VM twice during the snapshot process.  Once to take the
snapshot and a second time to delete it.  This is a simple change and I
already have the code done but think #1 would be a cleaner approach.

Or, we could support both via configuration options.  I looked at the
possibility of executing snapshots via libvirt but realized it only
recently became available and the documentation around the API hook is a
bit sparse.  So it is probably not a feasible solution at this time, mainly
due do the former.

If anyone else has another idea, that would be great to hear as well.

Rafi


On Thursday, August 23, 2012, Vishvananda Ishaya wrote:

> We discussed this in the mailing list in the past
>
> quoting daniel:
>
> > a) is it safe to use qemu-img to create/delete a snapshot in a disk file
> that libvirt is writing to.
> > if not:
> > b) is it safe to use qemu-img to delete a snapshot in a disk file that
> libvirt is writing to but not actively using.
> > if not:
> > c) is it safe to use qemu-img to create/delete a snapshot in a disk file
> that libvirt has an open file handle to.
>
> Sadly, the answer is no to all those questions. For Qcow2 files, using
> internal snapshots, you cannot make *any* changes to the qcow2 file,
> while QEMU has it open. The reasons are that QEMU may have metadata
> changes pending to the file which have not yet flushed to disk, and
> second, creating/deleteing the snapshot with qemu-img may cause
> metadat changes that QEMU won't be aware of. Either way you will likely
> cause corruption of the qcow2 file.
>
> For these reasons, QEMU provides monitor commands for snapshotting,
> that libvirt uses whenever the guest is running. Libvirt will only
> use qemu-img, if the the guest is offline.
>
> Regards,
> Daniel
>
>
> So we unfortunately cannot delete the snapshot while the domain is
> running. Unless we are willing to leave a bunch of old internal snapshots
> in the file then we have to deal with this performance hit.
> Vish
>
> On Aug 23, 2012, at 5:27 PM, Rafi Khardalian <rafi at metacloud.com> wrote:
>
> > Assuming there are reasons for keeping suspend part of the snapshot
> > process, the flow can be optimized to reduce the impact to running VMs.
> > This is done by resuming immediately after the "qemu-img snapshot"
> > operation (libvirt_utils.create_snapshot), rather than waiting until the
> > "qemu-img convert" process (libvirt_utils.extract_snapshot) also
> > completes.  I've been unable to find a reason for waiting until the
> > convert is done.
> >
> > Modified snippet snapshot() snippet from the libvirt driver, representing
> > the change I'm proposing:
> >
> >        # Make the snapshot
> >        try:
> >            libvirt_utils.create_snapshot(disk_path, snapshot_name)
> >        finally:
> >            if state == power_state.RUNNING:
> >                self._create_new_domain(xml_desc)
> >
> >        # Export the snapshot to a raw image
> >        with utils.tempdir() as tmpdir:
> >            try:
> >                out_path = os.path.join(tmpdir, snapshot_name)
> >                libvirt_utils.extract_snapshot(disk_path, source_format,
> >                                               snapshot_name, out_path,
> >                                               image_format)
> >            finally:
> >                libvirt_utils.delete_snapshot(disk_path, snapshot_name)
> >
> > I agree it would be ideal if we could find a way to guarantee a
> consistent
> > state in the guest VM, though I'm concerned about how users would respond
> > to a full shutdown being forced upon them to take a snapshot.
> >
> > -----Original Message-----
> > From: Joshua Harlow [mailto:harlowja at yahoo-inc.com]
> > Sent: Thursday, August 23, 2012 5:13 PM
> > To: OpenStack Development Mailing List; Rafi Khardalian
> > Cc: openstack-dev
> > Subject: Re: [openstack-dev] Libvirt snapshot process optimization
> >
> > I'd almost like to see the VM be shutdown before snapshot, but that零 just
> > me.
> >
> > In fact just looking at the libvirt docs, 'suspend does not save a
> > persistent image of the guest's memory. For this, save is used.' So that
> > could leave guests in some weird state, so that sort of sucks. A shutdown
> > could at least trigger ACPI shutdown to occur in the VM and would
> > hopefully leave it in a ok state (emphasis on hopefully). I just think
> > that reducing the amount of time is going to be hard without
> > hypervisor<->vm communication (ie signaling all the apps in the vm to
> > stop) or libvirt (+others) needs to persist the memory image.
> >
> > My guess is suspend is trying to do what it can, which won't be 100%
> right
> > without memory state saving or some other communication happening...
> > Perhaps a 'save' call (or shutdown sequence) should be used, but this
> > probably isn't any faster, but at least it would be 'correct' (shared
> > storage state not included). There is also the question of uploading
> > snapshots (but that零 a different question).
> >
> > On 8/23/12 2:00 PM, "Rafi Khardalian" <rafi at metacloud.com> wrote:
> >
> >> Hi all,
> >>
> >> I'm looking at the libvirt snapshot code and was wondering about the
> > order
> >> and purpose of several operations.  At a high level, it looks like the
> VM
> >> being snapshotted is first suspended (managedSave), actual qcow2
> snapshot
> >> is taken, then extraction is done (qemu-img convert) before returning
> the
> >> instance to its prior state.
> >>
> >> My question is, with snapshots being atomic, why suspend the VM?
> > Assuming
> >> there's a reason for this, why not do the qemu-img convert call after
> the
> >> VM



-- 
---
Rafi Khardalian
Vice President, Operations | Metacloud, Inc.
Email: rafi at metacloud.com | Tel: 855-638-2256, Ext. 2662
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20120828/aa3a6914/attachment-0001.html>


More information about the OpenStack-dev mailing list