[openstack-dev] Libvirt snapshot process optimization

Rafi Khardalian rafi at metacloud.com
Fri Aug 24 00:27:33 UTC 2012


Assuming there are reasons for keeping suspend part of the snapshot
process, the flow can be optimized to reduce the impact to running VMs.
This is done by resuming immediately after the "qemu-img snapshot"
operation (libvirt_utils.create_snapshot), rather than waiting until the
"qemu-img convert" process (libvirt_utils.extract_snapshot) also
completes.  I've been unable to find a reason for waiting until the
convert is done.

Modified snippet snapshot() snippet from the libvirt driver, representing
the change I'm proposing:

        # Make the snapshot
        try:
            libvirt_utils.create_snapshot(disk_path, snapshot_name)
        finally:
            if state == power_state.RUNNING:
                self._create_new_domain(xml_desc)

        # Export the snapshot to a raw image
        with utils.tempdir() as tmpdir:
            try:
                out_path = os.path.join(tmpdir, snapshot_name)
                libvirt_utils.extract_snapshot(disk_path, source_format,
                                               snapshot_name, out_path,
                                               image_format)
            finally:
                libvirt_utils.delete_snapshot(disk_path, snapshot_name)

I agree it would be ideal if we could find a way to guarantee a consistent
state in the guest VM, though I'm concerned about how users would respond
to a full shutdown being forced upon them to take a snapshot.

-----Original Message-----
From: Joshua Harlow [mailto:harlowja at yahoo-inc.com]
Sent: Thursday, August 23, 2012 5:13 PM
To: OpenStack Development Mailing List; Rafi Khardalian
Cc: openstack-dev
Subject: Re: [openstack-dev] Libvirt snapshot process optimization

I'd almost like to see the VM be shutdown before snapshot, but that零 just
me.

In fact just looking at the libvirt docs, 'suspend does not save a
persistent image of the guest's memory. For this, save is used.' So that
could leave guests in some weird state, so that sort of sucks. A shutdown
could at least trigger ACPI shutdown to occur in the VM and would
hopefully leave it in a ok state (emphasis on hopefully). I just think
that reducing the amount of time is going to be hard without
hypervisor<->vm communication (ie signaling all the apps in the vm to
stop) or libvirt (+others) needs to persist the memory image.

My guess is suspend is trying to do what it can, which won't be 100% right
without memory state saving or some other communication happening...
Perhaps a 'save' call (or shutdown sequence) should be used, but this
probably isn't any faster, but at least it would be 'correct' (shared
storage state not included). There is also the question of uploading
snapshots (but that零 a different question).

On 8/23/12 2:00 PM, "Rafi Khardalian" <rafi at metacloud.com> wrote:

>Hi all,
>
>I'm looking at the libvirt snapshot code and was wondering about the
order
>and purpose of several operations.  At a high level, it looks like the VM
>being snapshotted is first suspended (managedSave), actual qcow2 snapshot
>is taken, then extraction is done (qemu-img convert) before returning the
>instance to its prior state.
>
>My question is, with snapshots being atomic, why suspend the VM?
Assuming
>there's a reason for this, why not do the qemu-img convert call after the
>VM has been resumed?  I figure there are reasons for the current order of
>operation and wanted to understand them before making changes.  The goal
>here is to reduce the amount of time a VM is unavailable while a snapshot
>is being created, as the current approach is rather disruptive for
>anything with a large root disk.
>
>The preferred approach, in my mind, is to snapshot without any downtime
>whatsoever.  Granted, this relies on the guest being in a consistent
>state, which is already the case considering a libvirt suspend doesn't
>guarantee anything at the guest OS level.
>
>Any insight would be appreciated.
>
>Thanks,
>---
>Rafi Khardalian
>Vice President, Operations | Metacloud, Inc.
>Email: rafi at metacloud.com | Tel: 855-638-2256, Ext. 2662
>
>_______________________________________________
>OpenStack-dev mailing list
>OpenStack-dev at lists.openstack.org
>http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



More information about the OpenStack-dev mailing list