[openstack-dev] [nova] live-snapshot/cloning of virtual machines

Vishvananda Ishaya vishvananda at gmail.com
Fri Aug 16 17:17:34 UTC 2013


On Fri, Aug 16, 2013 at 3:05 AM, Daniel P. Berrange <berrange at redhat.com>wrote:

> On Wed, Aug 14, 2013 at 04:53:01PM -0700, Vishvananda Ishaya wrote:
> > Hi Everyone,
> >
> > I have been trying for some time to get the code for the live-snapshot
> blueprint[1]
> > in. Going through the review process for the rpc and interface code[2]
> was easy. I
> > suspect the api-extension code[3] will also be relatively trivial to get
> in. The
> > main concern is with the libvirt driver implementation[4]. I'd like to
> discuss the
> > concerns and see if we can make some progress.
> >
> > Short Summary (tl;dr)
> > =====================
> >
> > I propose we merge live-cloning as an experimental feature for havanna
> and have the
> > api extension disabled by default.
> >
> > Overview
> > ========
> >
> > First of all, let me express the value of live snapshoting. The slowest
> part of the
> > vm provisioning process is generally booting of the OS. The advantage of
> live-
> > snapshotting is that it allows the possibility of bringing up
> application servers
> > while skipping the overhead of vm (and application startup).
>
> For Linux at least I think bootup time is a problem that is being solved
> by the
> distros. It is possible to boot up many modern Linux distros in a couple
> of seconds
> even in physical hardware - VMs can be even faster since they don't have
> such stupid
> BIOS to worry about & have a restricted set of possible hardware. This is
> on a par
> with, or better than, the overheads imposed by Nova itself in the boot up
> process.
>
> Windows may be a different story, but I've not used it in years so don't
> know what
> its boot performance is like.
>
> > I recognize that this capability comes with some security concerns, so I
> don't expect
> > this feature to go in and be ready to for use in production right away.
> Similarly,
> > containers have a lot of the same benefit, but have had their own
> security issues
> > which are gradually being resolved. My hope is that getting this feature
> in would
> > allow people to start experimenting with live-booting so that we could
> uncover some
> > of these security issues.
> >
> > There are two specific concerns that have been raised regarding my
> patch. The first
> > concern is related to my use of libvirt. The second concern is related
> to the security
> > issues above. Let me address them separately.
> >
> > 1. Libvirt Issues
> > =================
> >
> > The only feature I require from the hypervisor is to load
> memory/processor state for
> > a vm from a file. Qemu supports this directly. The only way that libvirt
> exposes this
> > functionality is via its restore command which is specifically for
> restoring the
> > previous state of an existing vm. "Cloning", or restoring the memory
> state of a
> > cloned vm is considered unsafe (which I will address in the second
> point, below).
> >
> > The result of the limited api is that I must include some hacks to make
> the restore
> > command actually allow me to restore the state of the new vm. I
> recognize that this
> > is using an undocumented libvirt api and isn't the ideal solution, but
> it seemed
> > "better" then avoiding libvirt and talking directly to qemu.
> >
> > This is obviously not ideal. It is my hope that this 0.1 version of the
> feature will
> > allow us to iteratively improve the live-snapshot/clone proccess and get
> the security
> > to a point where the libvirt maintainers would be willing to accept a
> patch to directly
> > expose an api to load memory from a file.
>
> To characterize this as a libvirt issue is somewhat misleading. The reason
> why libvirt
> does not explicitly allow this, is that from discussions with the upstream
> QEMU/KVM
> developers, the recommendation/advise that this is not a safe operation
> and should not
> be exposed to application developers.
>
> The expectation is that the functionality in QEMU is only targetted for
> taking point in
> time snapshots & allowing rollback of a VM to those snapshots, not
> creating clones of
> active VMs.
>

Thanks for the clarification here. I wasn't aware that this requirement
came from qemu
upstream.


>
> > 2. Security Concerns
> > ====================
> >
> > There are a number of security issues with loading state from another
> vm. Here is a
> > short list of things that need to be done just to make a cloned vm
> usable:
> >
> > a) mac address needs to be recreated
> > b) entropy pool needs to be reset
> > c) host name must be reset
> > d) host keys bust be regenerated
> >
> > There are others, and trying to clone a running application as well may
> expose other
> > sensitive data, especially if users are snaphsoting vms and making them
> public.
> >
> > The only issue that I address on the driver side is the mac addresses.
> This is the
> > minimum that needs to be done just to be able to access the vm over the
> network. This
> > is implemented by unplugging all network devices before the snapshot and
> plugging new
> > network devices in on clone. This isn't the most friendly thing to guest
> applications,
> > but it seems like the safest option for the first version of this
> feature.
>
> This is not really as safe as you portray. When restoring from the
> snapshot the VM
> will initially be running with virtual NIC with a different MAC address
> from the one
> associated with the in memory OS kernel state. Even if you hotunplug the
> device and
> plug in a new one, you still have a period where the virtual hardware
> exposed to the
> guest does not match the memory state of the guest OS. Perhaps you will be
> lucky and
> not hit problems with some OS, but equally you can be unlucky and do bad
> things to
> the OS kernel or application state. Relying on luck in this way does not
> lead to a
> supportable solution IMHO.
>

Just a clarification, the most recent version of the patch unplugs network
devices before
the snapshot, so there is no longer a window here.


>
> There are other unique identifiers exposed in the virtual hardware that
> will/should
> change when you clone VMs too, the host UUID, the storage serial keys and
> you cannot
> easily fix those by just unplugging hardware & replugging it.
>

Clearly, hence my comment above about needing a guest agent to do the
important work.


> > So cloning vms must be done with care. Sensitive data must be removed
> from the vm
> > pre-clone and new data needs to be generated post-clone. Ultimately this
> should all
> > be done via guest-agent of some sort. I have found some volunteers to
> make the guest
> > agent a reality, but it will take a bit of time to get something
> workable, and it
> > will be much more difficult if there isn't a way to test the feature.
>
> Note that even if you think you have removed security data from a VM's
> filesystem,
> it is quite likely that the data will still in fact exist in the
> unallocated sectors
> of VM's block devices and can be fairly easily recovered from them.
>
> The libguestfs project provide tools to perform offline cloning of VM disk
> images.
> Its virt-sysprep knows how to delete alot (but by no means all possible)
> sensitive
> file data for common Linux & Windows OS. It still has to be combined with
> use of
> the virt-sparsify tool though, to ensure the deleted data is actually
> purged from
> the VM disk image as well as the filesystem, by releasing all unused VM
> disk sectors
> back to the host storage (and not all storage supports that).
>
> > Conclusion
> > ==========
> >
> > There are obviously problems to be solved with the whole idea of live
> cloning, but
> > I think it enables some important new ways of deploying applications.
> Imagine for
> > example a PaaS built on fast-cloning vms instead of containers. This is
> clearly a
> > long term project but if we block it now it may never get the support it
> needs to
> > become a real option.
> >
> > Proposal
> > ========
> >
> > I propose we allow the patch in and we leave the live-snapshot extension
> disabled
> > by default. Deployers can turn on the extension to experiment with the
> feature.
> > This will allow other hypervisors do do an implementation, and the
> community in
> > general to improve the security and usefulness of live-cloned virtual
> machines.
> >
> > I'm very interested in your thoughts and feedback. Thank you to everyone
> who made
> > it this far.
>
> I don't think it is a good idea to add a feature which is considered to
> be unsupportable by the developers of the virt platform.
>

You make excellent points. I'm not totally convinced that this feature is
the right
long-term direction, but I still think it is interesting. To be fair, I'm
not convinced that
virtual machines as a whole are the right long-term direction. I'm still
looking for a way
for people experiment with this and see what use-cases that come out of it.

Over the past three years OpenStack has been a place where we can iterate
quickly and
try new things. Multihost nova-network was an experiment of mine that
turned into the
most common deployment strategy for a long time.

Maybe we've grown up to the point where we have to be more careful and not
introduce
these kind of features and the maintenance cost of introducing experimental
features is
too great. If that is the community consensus, then I'm happy keep the live
snapshot stuff
in a branch on github for people to experiment with.

Vish

>
> Regards,
> Daniel
> --
> |: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/:|
> |: http://libvirt.org              -o-             http://virt-manager.org:|
> |: http://autobuild.org       -o-         http://search.cpan.org/~danberr/:|
> |: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc:|
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20130816/50ab34b0/attachment.html>


More information about the OpenStack-dev mailing list