[openstack-dev] Libvirt snapshot process optimization

Rafi Khardalian rafi at metacloud.com
Thu Aug 23 21:00:47 UTC 2012


Hi all,

I'm looking at the libvirt snapshot code and was wondering about the order
and purpose of several operations.  At a high level, it looks like the VM
being snapshotted is first suspended (managedSave), actual qcow2 snapshot
is taken, then extraction is done (qemu-img convert) before returning the
instance to its prior state.

My question is, with snapshots being atomic, why suspend the VM?  Assuming
there's a reason for this, why not do the qemu-img convert call after the
VM has been resumed?  I figure there are reasons for the current order of
operation and wanted to understand them before making changes.  The goal
here is to reduce the amount of time a VM is unavailable while a snapshot
is being created, as the current approach is rather disruptive for
anything with a large root disk.

The preferred approach, in my mind, is to snapshot without any downtime
whatsoever.  Granted, this relies on the guest being in a consistent
state, which is already the case considering a libvirt suspend doesn't
guarantee anything at the guest OS level.

Any insight would be appreciated.

Thanks,
---
Rafi Khardalian
Vice President, Operations | Metacloud, Inc.
Email: rafi at metacloud.com | Tel: 855-638-2256, Ext. 2662



More information about the OpenStack-dev mailing list