[openstack-dev] Proposal for instance-level snapshots in Nova

Jon Bernard jbernard at tuxion.com
Tue Feb 4 20:44:15 UTC 2014


* John Garbutt <john at johngarbutt.com> wrote:
> On 24 January 2014 17:05, Jon Bernard <jbernard at tuxion.com> wrote:
> > * Vishvananda Ishaya <vishvananda at gmail.com> wrote:
> >>
> >> On Jan 16, 2014, at 1:28 PM, Jon Bernard <jbernard at tuxion.com> wrote:
> >>
> >> > * Vishvananda Ishaya <vishvananda at gmail.com> wrote:
> >> >>
> >> >> On Jan 14, 2014, at 2:10 PM, Jon Bernard <jbernard at tuxion.com> wrote:
> >> >>
> >> >>>
> >> >>> <snip>
> >> >>>> As you've defined the feature so far, it seems like most of it could
> >> >>>> be implemented client side:
> >> >>>>
> >> >>>> * pause the instance
> >> >>>> * snapshot the instance
> >> >>>> * snapshot any attached volumes
> >> >>>
> >> >>> For the first milestone to offer crash-consistent snapshots you are
> >> >>> correct.  We'll need some additional support from libvirt, but the
> >> >>> patchset should be straightforward.  The biggest question I have
> >> >>> surrounding initial work is whether to use an existing API call or
> >> >>> create a new one.
> >> >>>
> >> >>
> >> >> I think you might have missed the "client side" part of this point. I agree
> >> >> that the snapshot multiple volumes and package it up is valuable, but I was
> >> >> trying to make the point that you could do all of this stuff client side
> >> >> if you just add support for snapshotting ephemeral drives. An all-in-one
> >> >> snapshot command could be valuable, but you are talking about orchestrating
> >> >> a lot of commands between nova, glance, and cinder and it could get kind
> >> >> of messy to try to run the whole thing from nova.
> >> >
> >> > If you expose each primitive required, then yes, the client could
> >> > implement the logic to call each primitive in the correct order, handle
> >> > error conditions, and exit while leaving everything in the correct
> >> > state.  But that would mean you would have to implement it twice - once
> >> > in python-novaclient and once in Horizon.  I would speculate that doing
> >> > this on the client would be even messier.
> >> >
> >> > If you are concerned about the complexity of the required interactions,
> >> > we could narrow the focus in this way:
> >> >
> >> >  Let's say that taking a full snapshot/backup (all volumes) operates
> >> >  only on persistent storage volumes.  Users who booted from an
> >> >  ephemeral glance image shouldn't expect this feature because, by
> >> >  definition, the boot volume is not expected to live a long life.
> >> >
> >> > This should limit the communication to Nova and Cinder, while leaving
> >> > Glance out (initially).  If the user booted an instance from a cinder
> >> > volume, then we have all the volumes necessary to create an OVA and
> >> > import to Glance as a final step.  If the boot volume is an image then
> >> > I'm not sure, we could go in a few directions:
> >> >
> >> >  1. No OVA is imported due to lack of boot volume
> >> >  2. A copy of the original image is included as a boot volume to create
> >> >     an OVA.
> >> >  3. Something else I am failing to see.
> >>
> >> >
> >> > If [2] seems plausible, then it probably makes sense to just ask glance
> >> > for an image snapshot from nova while the guest is in a paused state.
> >> >
> >> > Thoughts?
> >>
> >> This already exists. If you run a snapshot command on a volume backed instance
> >> it snapshots all attached volumes. Additionally it does throw a bootable image
> >> into glance referring to all of the snapshots.  You could modify create image
> >> to do this for regular instances as well, specifying block device mapping but
> >> keeping the vda as an image. It could even do the same thing with the ephemeral
> >> disk without a ton of work. Keeping this all as one command makes a lot of sense
> >> except that it is unexpected.
> >>
> >> There is a benefit to only snapshotting the root drive sometimes because it
> >> keeps the image small. Here's what I see as the ideal end state:
> >>
> >> Two commands(names are a strawman):
> >>   create-full-image -- image all drives
> >>   create-root-image -- image just the root drive
> >>
> >> These should work the same regardless of whether the root drive is volume backed
> >> instead of the craziness we have to day of volume-backed snapshotting all drives
> >> and instance backed just the root.  I'm not sure how we manage expectations based
> >> on the current implementation but perhaps the best idea is just adding this in
> >> v3 with new names?
> >>
> >> FYI the whole OVA thing seems moot since we already have a way of representing
> >> multiple drives in glance via block_device_mapping properites.
> >
> > I've had some time to look closer at nova and rethink things a bit and
> > I see what you're saying.  You are correct, taking snapshots of attached
> > volumes is currently supported - although not in the way that I would
> > like to see.  And this is where I think we can improve.
> >
> > Let me first summarize my understanding of what we currently have.
> > There are three way of creating a snapshot-like thing in Nova:
> >
> >   1. create_image - takes a snapshot of the root volume and may take
> >      snapshots of the attached volumes depending on the volume type of
> >      the root volume.  I/O is not quiesced.
> >
> >   2. create_backup - takes a snapshot of the root volume with options
> >      to specify how often to repeat and how many previous snapshots to
> >      keep around. I/O is not quiesced.
> >
> >   3. os-assisted-snapshot - takes a snapshot of a single cinder volume.
> >      The volume is first quiesced before the snapshot is initiated.
> >
> > My general thesis is that I/O should be quiesced in all cases if the
> > underlying driver supports it.  Libvirt supports this feature and
> > I would like to extend the existing functionality to take advantage of
> > it.
> >
> > It's not reasonable to change the names or behaviour of the existing
> > public api calls.  Instead I would like to create a new snapshot() call
> > in the v3 API.
> >
> > We only need a quiesce() call added to the driver and the rest of the
> > implementation will live in the api layer.  Once implemented, the
> > existing snapshot calls (image, backup, os-assisted) could use the
> > underlying snapshot routines to achieve their expected results.  Leaving
> > us with only one set of snapshot-related functions to maintain.
> >
> > The new snapshot call would take at least one option: the drives that
> > should be snapshotted:
> >
> >     snapshot(devices=['vda', 'vdb'])
> >
> > Where a value of None implies all volumes.
> >
> > This allows the user to snapshot only the root volume if a small
> > bootable image is desired.
> >
> > There will be no exclusion based on volume type, both glance and cinder
> > volumes will be snapshotted respectively.  Otherwise we reach the
> > unexpected behaviour that you mentioned earlier and I agree, it would
> > have been confusing.
> >
> > The flow will look like:
> >
> >   * call the compute node to quiesce
> >   * call the compute node to snapshot each individual glance drive
> >   * call the volume driver to snapshot each cinder volume
> >   * package the whole thing
> 
> If we have multiple calls to the compute node, it should probably get
> moved to the conductor. I kinda assume it would be extra driver calls
> though.
> 
> > The final result is an image in glance that references each attached
> > volume via its block device mapping.  For a cinder-backed instance, the
> > glance image would contain no data and only references to cinder
> > snapshots.  As far as I can tell, glance already supports these
> > requirements.
> >
> > If create_image and create_backup are updated to use this
> > implementation, then the behaviour will appear unchanged to the user
> > with the exception that I/O was quiesced during the snapshot(s) and they
> > therefore have a more reliable and useful result.
> >
> > Given this, I think it makes more sense to leave the implementation
> > within the api layer of Nova so that existing functions can share in the
> > implementation - as opposed to moving it into the client.
> >
> > What are your thoughts?  Is this approaching something sensible?
> 
> Thinking about this a little more, I wonder if we simply want to make
> some of these options to the existing v3 create_image call, so there
> is a single API call for creating a snapshot?

It could be done, but it would make the incremental development of this
feature more difficult, in that each merged patchset must function
correctly as well and not introduce any regressions in the existing
create_image logic.  I might be wrong in this assumption though.

Also, the feature is more about creating a snapshot of an instance as
opposed to an image.  In certain cases each snapshot volume may reside
in Cinder with no disk images in Glance.

I would prefer to develop it as a new API call and later refactor
create_image to make use of it - and perhaps phase it out completely at
some much later point.  The same could be done for create_backup and the
os-assisted version as well.

That said, it can go either way.  Do you have a preference here?

> I could see us having options like:
> * leave volumes alone, except if boot from volume (default, current behaviour)
> * optionally record current volume mapping in glance
> * take volume snapshots, and reference the snapshots in the glance image
> * specify how many previous snapshots to keep (default=0)
> * quiesce while taking snapshots
> 
> Just one question around quiesce, I thought there was some kind of
> time limit on how long the quiesce is allowed to take?

I believe libvirt allows 60 seconds for the guest agent to complete the
quiesce operation.  In the event that quiesce times out (or otherwise
fails), then create_snapshot() would cancel any existing snapshots and
return an error.  If we have an option to proceed without a quiesce if
this occurs, then some way of indicating that to the user will be
necessary.  Metadata in the glace template could be used for this.

-- 
Jon



More information about the OpenStack-dev mailing list