[openstack-dev] [gate] [nova] live migration, libvirt 1.3, and the gate

Daniel P. Berrange berrange at redhat.com
Tue May 31 09:39:41 UTC 2016

On Tue, May 24, 2016 at 01:59:17PM -0400, Sean Dague wrote:
> The team working on live migration testing started with an experimental
> job on Ubuntu 16.04 to try to be using the latest and greatest libvirt +
> qemu under the assumption that a set of issues we were seeing are
> solved. The short answer is, it doesn't look like this is going to work.
> We run tests on a bunch of different clouds. Those clouds expose
> different cpu flags to us. These are not standard things that map to
> "Haswell". It means live migration in the multinode cases can hit cpus
> with different flags. So we found the requirement was to come up with a
> least common denominator of cpu flags, which we call gate64, and push
> that into the libvirt cpu_map.xml in devstack, and set whenever we are
> in a multinode scenario.
> (https://github.com/openstack-dev/devstack/blob/master/tools/cpu_map_update.py)
>  Not ideal, but with libvirt 1.2.2 it works fine.
> It turns out it works fine because libvirt *actually* seems to take the
> data from cpu_map.xml and do a translation to what it believes qemu will
> understand. On these systems apparently this turns into "-cpu
> Opteron_G1,-pse36"
> (http://logs.openstack.org/29/42529/24/check/gate-tempest-dsvm-multinode-full/5f504c5/logs/libvirt/qemu/instance-0000000b.txt.gz)
> At some point between libvirt 1.2.2 and 1.3.1, this changed. Now libvirt
> seems to be passing our cpu_model directly to qemu, and assumes that as
> a user you will be responsible for writing all the <feature/> stanzas to
> add/remove yourself. When libvirt sends 'gate64' to qemu, this explodes,
> as qemu has no idea what we are talking about.
> http://logs.openstack.org/34/319934/2/experimental/gate-tempest-dsvm-multinode-live-migration/b87d689/logs/screen-n-cpu.txt.gz#_2016-05-24_15_59_12_531
> Unlike libvirt, which has a text file (xml) that configures the cpus
> that could exist in the world, qemu builds this in statically at compile
> time:
> http://git.qemu.org/?p=qemu.git;a=blob;f=target-i386/cpu.c;h=895a386d3b7a94e363ca1bb98821d3251e70c0e0;hb=HEAD#l694
> So, the existing cpu_map.xml workaround for our testing situation will
> no longer work.
> So, we have a number of open questions:
> * Have our cloud providers standardized enough that we might get away
> without this custom cpu model? (Have some of them done it and only use
> those for multinode?)
> * Is there any way to get this feature back in libvirt to do the cpu
> computation?
> * Would we have to build a whole nova feature around setting libvirt xml
> <feature/> to be able to test live migration in our clouds?
> * Other options?
> * Do we give up and go herd goats?

Rather than try to define our own custom CPU models, we can probably
just use one of the standard CPU models and then explicitly tell
libvirt which flags to turn off in order to get compatibility with
our cloud environments.

This is not currently possible with Nova, since our nova.conf option
only allow us to specify a bare CPU model. We would have to extend
nova.conf to allow us to specify a list of CPU features to add or
remove. Libvirt should then correctly pass these changes through
to QEMU.

