[openstack-dev] [nova][live-migration] Libvirt storage pools and persistent disk metadata specs

Matthew Booth mbooth at redhat.com
Wed Apr 6 11:53:43 UTC 2016

I've just submitted a new spec for the imagebackend work was doing in
Mitaka, which is a prerequisite for the libvirt storage pools work:


I was just about to resubmit libvirt storage pools with my spec as a
dependency, but re-reading it perhaps we should discuss it briefly first.
I'm aware that the driving goal for this move is to enable data transfer
over the libvirt pipe. However, while we're messing with the on-disk layout
it might be worth thinking about the image cache.

The current implementation of the image cache is certainly problematic. It
has expanded far beyond its original brief, and it shows. The code has no
coherent structure, which makes following it very difficult even to those
familiar with it. Worst of all, though, the behaviour of the image cache is
distributed across several different modules, with no obvious links between
those modules for the unwary (aka tight coupling). The image cache relies
on locking for correct behaviour, but this locking is also distributed
across many modules. Verifying the correct behaviour of the image cache's
locking is hard enough to be impractical, which shows in the persistent
stream of bugs we see relating to backing files going missing. In short,
the image cache implementation needs an extreme overhaul anyway in the
light of its current usage.

We also need to address the problem that the image cache doesn't store any
metadata about its images. We currently determine the file format of an
image cache entry by inspection. While we sanity check images when writing
them to the image cache, this is not a robust defence against format bugs
and vulnerabilities.

More that this, though, the design of the image cache no longer makes
sense. When the image cache was originally implemented, there was only
local file storage, but the libvirt driver also now supports LVM and ceph.
Over 60% of our users use ceph, so ceph is really important. The ceph
backend is often able to use images directly from glance if they're also in
ceph, but when they're not it continues to use this local file store, which
makes no sense.

When we move to libvirt storage pools (with my dependent change), we open
the possibility for users to have multiple local storage pools, and have
instance storage allocated between them according to policy defined in the
flavor. If we're using a common image cache for all storage pools, this
limits the differentiation between those storage pools. So if a user's
paying for instance storage on SSD, but the backing file is on spinning
rust, that's not great.

Logically, the image cache should be a property of the storage backend. So,
local file stores should cache images in the local file store. LVM should
cache images in LVM, which would also allow it to use writeable snapshots.
Ceph should cache images in the same pool its instance disks are in. This
means that the image cache is always directly usable by its storage pool,
which is its purpose.

The image cache also needs a robust design for operating on shared storage.
It currently doesn't have one, although the locking means that it's
hopefully probably only maybe slightly a little racy, with any luck,

This change may be too large to do right now, but we should understand that
changing it later will require a further migration of some description, and
bear that in mind. A local file/lvm/ceph storage pool with an external
image cache has a different implementation and layout to the same storage
pool with an integrated image cache. Data is stored in different places,
and is managed differently. If we don't do it now, it will be slightly
harder later.

Reading through the existing spec, I also notice that it mentions use of
the 'backingStore' element. This needs to come out of the spec, as we MUST
NOT use this. The problem is that it's introspected. I don't know if
libvirt stores this persistently while it's running, but it most certainly
recreates it after a restart or pool refresh. Introspecting file formats
and backing files when the user can influence it is a severe security hole,
so we can't do it. Instead, we need to use the metadata created and defined
by the spec I linked at the top.

There will also need to be a lot more subclassing than this spec
anticipates. Specifically, I expect most backends to require a subclass. In
particular, the libvirt api doesn't provide storage locking, so we will
have to implement that for each backend.

I don't want to spend too long on the spec. The only thing worth of
discussion is the image cache, I guess.

Matthew Booth
Red Hat Engineering, Virtualisation Team

Phone: +442070094448 (UK)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20160406/8b1ce22c/attachment.html>

More information about the OpenStack-dev mailing list