Open Stack

Tue May 24 17:59:17 UTC 2016

The team working on live migration testing started with an experimental
job on Ubuntu 16.04 to try to be using the latest and greatest libvirt +
qemu under the assumption that a set of issues we were seeing are
solved. The short answer is, it doesn't look like this is going to work.

We run tests on a bunch of different clouds. Those clouds expose
different cpu flags to us. These are not standard things that map to
"Haswell". It means live migration in the multinode cases can hit cpus
with different flags. So we found the requirement was to come up with a
least common denominator of cpu flags, which we call gate64, and push
that into the libvirt cpu_map.xml in devstack, and set whenever we are
in a multinode scenario.
(https://github.com/openstack-dev/devstack/blob/master/tools/cpu_map_update.py)
 Not ideal, but with libvirt 1.2.2 it works fine.

It turns out it works fine because libvirt *actually* seems to take the
data from cpu_map.xml and do a translation to what it believes qemu will
understand. On these systems apparently this turns into "-cpu
Opteron_G1,-pse36"
(http://logs.openstack.org/29/42529/24/check/gate-tempest-dsvm-multinode-full/5f504c5/logs/libvirt/qemu/instance-0000000b.txt.gz)

At some point between libvirt 1.2.2 and 1.3.1, this changed. Now libvirt
seems to be passing our cpu_model directly to qemu, and assumes that as
a user you will be responsible for writing all the <feature/> stanzas to
add/remove yourself. When libvirt sends 'gate64' to qemu, this explodes,
as qemu has no idea what we are talking about.
http://logs.openstack.org/34/319934/2/experimental/gate-tempest-dsvm-multinode-live-migration/b87d689/logs/screen-n-cpu.txt.gz#_2016-05-24_15_59_12_531

Unlike libvirt, which has a text file (xml) that configures the cpus
that could exist in the world, qemu builds this in statically at compile
time:
http://git.qemu.org/?p=qemu.git;a=blob;f=target-i386/cpu.c;h=895a386d3b7a94e363ca1bb98821d3251e70c0e0;hb=HEAD#l694

So, the existing cpu_map.xml workaround for our testing situation will
no longer work.

So, we have a number of open questions:

* Have our cloud providers standardized enough that we might get away
without this custom cpu model? (Have some of them done it and only use
those for multinode?)
* Is there any way to get this feature back in libvirt to do the cpu
computation?
* Would we have to build a whole nova feature around setting libvirt xml
<feature/> to be able to test live migration in our clouds?
* Other options?
* Do we give up and go herd goats?

	-Sean

-- 
Sean Dague
http://dague.net

Open Stack

[openstack-dev] [gate] [nova] live migration, libvirt 1.3, and the gate

OpenStack

Community

Documentation

Branding & Legal