[openstack-dev] Proposal for instance-level snapshots in Nova

John Garbutt john at johngarbutt.com
Tue Jan 28 10:12:06 UTC 2014

On 24 January 2014 17:05, Jon Bernard <jbernard at tuxion.com> wrote:
> * Vishvananda Ishaya <vishvananda at gmail.com> wrote:
>> On Jan 16, 2014, at 1:28 PM, Jon Bernard <jbernard at tuxion.com> wrote:
>> > * Vishvananda Ishaya <vishvananda at gmail.com> wrote:
>> >>
>> >> On Jan 14, 2014, at 2:10 PM, Jon Bernard <jbernard at tuxion.com> wrote:
>> >>
>> >>>
>> >>> <snip>
>> >>>> As you've defined the feature so far, it seems like most of it could
>> >>>> be implemented client side:
>> >>>>
>> >>>> * pause the instance
>> >>>> * snapshot the instance
>> >>>> * snapshot any attached volumes
>> >>>
>> >>> For the first milestone to offer crash-consistent snapshots you are
>> >>> correct.  We'll need some additional support from libvirt, but the
>> >>> patchset should be straightforward.  The biggest question I have
>> >>> surrounding initial work is whether to use an existing API call or
>> >>> create a new one.
>> >>>
>> >>
>> >> I think you might have missed the "client side" part of this point. I agree
>> >> that the snapshot multiple volumes and package it up is valuable, but I was
>> >> trying to make the point that you could do all of this stuff client side
>> >> if you just add support for snapshotting ephemeral drives. An all-in-one
>> >> snapshot command could be valuable, but you are talking about orchestrating
>> >> a lot of commands between nova, glance, and cinder and it could get kind
>> >> of messy to try to run the whole thing from nova.
>> >
>> > If you expose each primitive required, then yes, the client could
>> > implement the logic to call each primitive in the correct order, handle
>> > error conditions, and exit while leaving everything in the correct
>> > state.  But that would mean you would have to implement it twice - once
>> > in python-novaclient and once in Horizon.  I would speculate that doing
>> > this on the client would be even messier.
>> >
>> > If you are concerned about the complexity of the required interactions,
>> > we could narrow the focus in this way:
>> >
>> >  Let's say that taking a full snapshot/backup (all volumes) operates
>> >  only on persistent storage volumes.  Users who booted from an
>> >  ephemeral glance image shouldn't expect this feature because, by
>> >  definition, the boot volume is not expected to live a long life.
>> >
>> > This should limit the communication to Nova and Cinder, while leaving
>> > Glance out (initially).  If the user booted an instance from a cinder
>> > volume, then we have all the volumes necessary to create an OVA and
>> > import to Glance as a final step.  If the boot volume is an image then
>> > I'm not sure, we could go in a few directions:
>> >
>> >  1. No OVA is imported due to lack of boot volume
>> >  2. A copy of the original image is included as a boot volume to create
>> >     an OVA.
>> >  3. Something else I am failing to see.
>> >
>> > If [2] seems plausible, then it probably makes sense to just ask glance
>> > for an image snapshot from nova while the guest is in a paused state.
>> >
>> > Thoughts?
>> This already exists. If you run a snapshot command on a volume backed instance
>> it snapshots all attached volumes. Additionally it does throw a bootable image
>> into glance referring to all of the snapshots.  You could modify create image
>> to do this for regular instances as well, specifying block device mapping but
>> keeping the vda as an image. It could even do the same thing with the ephemeral
>> disk without a ton of work. Keeping this all as one command makes a lot of sense
>> except that it is unexpected.
>> There is a benefit to only snapshotting the root drive sometimes because it
>> keeps the image small. Here's what I see as the ideal end state:
>> Two commands(names are a strawman):
>>   create-full-image -- image all drives
>>   create-root-image -- image just the root drive
>> These should work the same regardless of whether the root drive is volume backed
>> instead of the craziness we have to day of volume-backed snapshotting all drives
>> and instance backed just the root.  I'm not sure how we manage expectations based
>> on the current implementation but perhaps the best idea is just adding this in
>> v3 with new names?
>> FYI the whole OVA thing seems moot since we already have a way of representing
>> multiple drives in glance via block_device_mapping properites.
> I've had some time to look closer at nova and rethink things a bit and
> I see what you're saying.  You are correct, taking snapshots of attached
> volumes is currently supported - although not in the way that I would
> like to see.  And this is where I think we can improve.
> Let me first summarize my understanding of what we currently have.
> There are three way of creating a snapshot-like thing in Nova:
>   1. create_image - takes a snapshot of the root volume and may take
>      snapshots of the attached volumes depending on the volume type of
>      the root volume.  I/O is not quiesced.
>   2. create_backup - takes a snapshot of the root volume with options
>      to specify how often to repeat and how many previous snapshots to
>      keep around. I/O is not quiesced.
>   3. os-assisted-snapshot - takes a snapshot of a single cinder volume.
>      The volume is first quiesced before the snapshot is initiated.
> My general thesis is that I/O should be quiesced in all cases if the
> underlying driver supports it.  Libvirt supports this feature and
> I would like to extend the existing functionality to take advantage of
> it.
> It's not reasonable to change the names or behaviour of the existing
> public api calls.  Instead I would like to create a new snapshot() call
> in the v3 API.
> We only need a quiesce() call added to the driver and the rest of the
> implementation will live in the api layer.  Once implemented, the
> existing snapshot calls (image, backup, os-assisted) could use the
> underlying snapshot routines to achieve their expected results.  Leaving
> us with only one set of snapshot-related functions to maintain.
> The new snapshot call would take at least one option: the drives that
> should be snapshotted:
>     snapshot(devices=['vda', 'vdb'])
> Where a value of None implies all volumes.
> This allows the user to snapshot only the root volume if a small
> bootable image is desired.
> There will be no exclusion based on volume type, both glance and cinder
> volumes will be snapshotted respectively.  Otherwise we reach the
> unexpected behaviour that you mentioned earlier and I agree, it would
> have been confusing.
> The flow will look like:
>   * call the compute node to quiesce
>   * call the compute node to snapshot each individual glance drive
>   * call the volume driver to snapshot each cinder volume
>   * package the whole thing

If we have multiple calls to the compute node, it should probably get
moved to the conductor. I kinda assume it would be extra driver calls

> The final result is an image in glance that references each attached
> volume via its block device mapping.  For a cinder-backed instance, the
> glance image would contain no data and only references to cinder
> snapshots.  As far as I can tell, glance already supports these
> requirements.
> If create_image and create_backup are updated to use this
> implementation, then the behaviour will appear unchanged to the user
> with the exception that I/O was quiesced during the snapshot(s) and they
> therefore have a more reliable and useful result.
> Given this, I think it makes more sense to leave the implementation
> within the api layer of Nova so that existing functions can share in the
> implementation - as opposed to moving it into the client.
> What are your thoughts?  Is this approaching something sensible?

Thinking about this a little more, I wonder if we simply want to make
some of these options to the existing v3 create_image call, so there
is a single API call for creating a snapshot?

I could see us having options like:
* leave volumes alone, except if boot from volume (default, current behaviour)
* optionally record current volume mapping in glance
* take volume snapshots, and reference the snapshots in the glance image
* specify how many previous snapshots to keep (default=0)
* quiesce while taking snapshots

Just one question around quiesce, I thought there was some kind of
time limit on how long the quiesce is allowed to take?


More information about the OpenStack-dev mailing list