[openstack-dev] [nova] Overview of the libvirt instance storage series

Matthew Booth mbooth at redhat.com
Fri Jul 22 14:29:56 UTC 2016


This series is part of the priority feature libvirt-instance-storage[1]. At
first glance it may not be immediately apparent why, so I'll work backwards
for
context.

The purpose of the feature is to create an unambiguous, canonical source of
metadata about 'local' storage. 'Local' here is defined to be storage which
is
directly managed by Nova, so non-volume root, ephemeral, swap, and config
disks. This storage may not actually be local (eg Rbd), but it's managed
locally, so that's how we treat it. The justification for wanting to do
this is
in the spec, so I won't repeat it here.

In order to solve this problem, we first had to fix the interface between
libvirt.driver and libvirt.imagebackend. As it stands, the code currently
calls
Image.cache(), passing a callback function which will write data to a
target.
The problem is, we need this context in order to know in advance how the
disk
will be persisted. The lack of this context also results in some tortured
layering violations, such as the SUPPORTS_CLONE interface, where we
essentially
push Rbd-specific logic into driver. There are many examples of this. The
solution to this is the Image.create_from_image and Image.create_from_func
interfaces, which provide the backend with all the relevant context
required to
do backend-specific special handling in the backend.

We initially created a series of patches which first added the relevant new
interfaces for each of the 5[2] storage backends, and followed this up with
a
patch which updated driver to call the new interfaces. 2 things came back
from
early review feedback which made us re-examine this:

* The 'big bang' approach of switching all backends on in a single commit
  was tolerable, but not popular. We were asked to look into making the
  switch-over incremental.

* Feodor Tersin (but not tempest or the unit tests) noticed that we broke
  resize.

The resize problem it turns out was hard. The problem is that as well as
providing no context to the backend, the Image.cache() interface also does
'all
the things', which incidentally doesn't always involve the cache.
Additionally,
it was called via _create_image(), which also does 'all the things'. So
there
were 2 levels of 'all the things'. We expressly did not want to turn our
create
functions into new 'all the things' methods. Because both cache() and
_create_image() both do so many things that nobody can remember why they do
them all, and which ones interact badly in which contexts, they had become
magnets for hacks, workarounds, and bugs[3]. Rather than add another layer
to
_create_image(), we decided to finally pull it apart.

This resulting series achieves both of the above goals. We add a shim layer
which implements the new Image interfaces in terms of the existing cache()
interface. Specific backend implementations override the shim layer with
a backend-specific implementation. Once all backends have been updated we
can
start to remove additional assumptions from driver, but the switch-over will
happen incrementally.

We also deconstruct both _create_image and _create_images_and_backing, and
provide helper methods for callers in driver to easily implement only the
functionality they require. This both makes it obvious to the reader what is
happening for a particular caller, and removes interactions between
unexpected
behaviours (by eliminating them).

The change to _create_images_and_backing also makes it use common code with
regular backend disk creation. Although it wasn't an explicit goal of this
series, I believe this also fixes a problem which would have prevented live
migration between Lvm-backed instances.

In creating this series we have made good test coverage a high priority.
Specifically, we want to ensure that tests:

* Validate the behaviour we are interested in

* Provide assurance that the refactoring is not introducing regressions

For the latter reason, we have pulled any test changes we can to the front
of
the series, before making any actual changes. This allows us to see that the
tests ran successfully both before and after the interface change. After the
interface change, we update the tests to test the new interfaces. As they
have
more context, these are also easier to test more thoroughly.

The series can be viewed here:

  https://review.openstack.org/#/q/topic:libvirt-imagebackend

Note that the series is a single, dependent chain. Yes, you really do have
to
click next to see it all, because there are 35 patches. Yes, this is a total
pita to manage.

The last patch in the series is currently:

  https://review.openstack.org/#/c/320610/

This is the patch which adds the Qcow2 backend, which is currently the only
backend patch we've rebased on to the new series. We picked this because
it's
the default for most tests, so a green here gives us a moderately confident
feeling. The others will follow.

To the greatest extent practical the patches are all single purpose, which
is
why there are so many. I'll highlight the 'crux' patches here, in order:

* libvirt: Improve mocking of imagebackend disks
  https://review.openstack.org/#/c/333242/

This provides a single, common way for driver tests to mock imagebackend.
There
were a few existing approaches to this which had various deficiencies. This
new
one replaces them all, and allows easy inspection by driver of what went on.
The series relies on this interface a lot, which is why I highlight it here.

* libvirt: Add create_from_image and create_from_func to Backend
  https://review.openstack.org/#/c/333244/

This is the requested 'shim' layer. It updates the driver to call the new
interfaces, and implements these new interfaces in terms of the old
interfaces.
Note that we don't make any significant changes to existing tests in this
change, which validates that the shim layer continues to call the old
interface in the same way as before.

The immediately following change updates the tests to use the new
interfaces.

* libvirt: Add DiskFromImage and DiskFromFunc
  https://review.openstack.org/#/c/333522/

This is the first part of the change to deconstruct _create_image(). These
interfaces allow us to return a 'creatable' disk, which has not actually
been
created. The caller can then decide whether the disk should be created.

Over the next few changes we create some helper methods which return
iterators
over creatable disks.

* libvirt: Don't call _create_image from finish_migration
  https://review.openstack.org/#/c/337160/

This removes the problematic caller of _create_image. finish_migration
doesn't
create disks, so supporting it with _create_image and Image.cache() was
tortured. The new method is explicit about what needs to happen, and to
which
disks.

* libvirt: Replace _create_images_and_backing in pre_live_migration
  https://review.openstack.org/#/c/342224/

This is a big change to pre_live_migration. The purpose is to ensure that
disks
created for live migration are created identically to how they were
originally
created.

* libvirt: Add create_from_image & create_from_func for Qcow2
  https://review.openstack.org/#/c/320610/

The native implementation of the new backend interfaces for the Qcow2
backend.

I'd like to apologise in advance for dropping this immediately before
heading
out on holiday for a week. I will be back at work on Monday 1st August, and
I
obviously won't be able to respond to review feedback before then. In my
absence, Diana Clarke may be able to respond.

This is a big series, though, so I'd greatly appreciate eyes on all of it,
even
if some of the earlier patches receive -1s. If we could knock off even some
of
the earlier test patches, the patch bombs I keep dropping on gerrit will
get a
bit smaller ;)

Thanks,

Matt

[1] https://blueprints.launchpad.net/nova/+spec/libvirt-instance-storage
[2] Flat, Qcow2, Lvm, Rbd, Ploop
[3] For recent examples see stable libvirt rescue, and device tagging.
-- 
Matthew Booth
Red Hat Engineering, Virtualisation Team

Phone: +442070094448 (UK)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20160722/051178e1/attachment.html>


More information about the OpenStack-dev mailing list