[openstack-dev] [gate] [nova] live migration, libvirt 1.3, and the gate

Daniel P. Berrange berrange at redhat.com
Tue May 31 13:33:02 UTC 2016

On Tue, May 31, 2016 at 08:24:03AM -0400, Sean Dague wrote:
> On 05/31/2016 05:39 AM, Daniel P. Berrange wrote:
> > On Tue, May 24, 2016 at 01:59:17PM -0400, Sean Dague wrote:
> >> The team working on live migration testing started with an experimental
> >> job on Ubuntu 16.04 to try to be using the latest and greatest libvirt +
> >> qemu under the assumption that a set of issues we were seeing are
> >> solved. The short answer is, it doesn't look like this is going to work.
> >>
> >> We run tests on a bunch of different clouds. Those clouds expose
> >> different cpu flags to us. These are not standard things that map to
> >> "Haswell". It means live migration in the multinode cases can hit cpus
> >> with different flags. So we found the requirement was to come up with a
> >> least common denominator of cpu flags, which we call gate64, and push
> >> that into the libvirt cpu_map.xml in devstack, and set whenever we are
> >> in a multinode scenario.
> >> (https://github.com/openstack-dev/devstack/blob/master/tools/cpu_map_update.py)
> >>  Not ideal, but with libvirt 1.2.2 it works fine.
> >>
> >> It turns out it works fine because libvirt *actually* seems to take the
> >> data from cpu_map.xml and do a translation to what it believes qemu will
> >> understand. On these systems apparently this turns into "-cpu
> >> Opteron_G1,-pse36"
> >> (http://logs.openstack.org/29/42529/24/check/gate-tempest-dsvm-multinode-full/5f504c5/logs/libvirt/qemu/instance-0000000b.txt.gz)
> >>
> >> At some point between libvirt 1.2.2 and 1.3.1, this changed. Now libvirt
> >> seems to be passing our cpu_model directly to qemu, and assumes that as
> >> a user you will be responsible for writing all the <feature/> stanzas to
> >> add/remove yourself. When libvirt sends 'gate64' to qemu, this explodes,
> >> as qemu has no idea what we are talking about.
> >> http://logs.openstack.org/34/319934/2/experimental/gate-tempest-dsvm-multinode-live-migration/b87d689/logs/screen-n-cpu.txt.gz#_2016-05-24_15_59_12_531
> >>
> >> Unlike libvirt, which has a text file (xml) that configures the cpus
> >> that could exist in the world, qemu builds this in statically at compile
> >> time:
> >> http://git.qemu.org/?p=qemu.git;a=blob;f=target-i386/cpu.c;h=895a386d3b7a94e363ca1bb98821d3251e70c0e0;hb=HEAD#l694
> >>
> >>
> >> So, the existing cpu_map.xml workaround for our testing situation will
> >> no longer work.
> >>
> >> So, we have a number of open questions:
> >>
> >> * Have our cloud providers standardized enough that we might get away
> >> without this custom cpu model? (Have some of them done it and only use
> >> those for multinode?)
> >> * Is there any way to get this feature back in libvirt to do the cpu
> >> computation?
> >> * Would we have to build a whole nova feature around setting libvirt xml
> >> <feature/> to be able to test live migration in our clouds?
> >> * Other options?
> >> * Do we give up and go herd goats?
> > 
> > Rather than try to define our own custom CPU models, we can probably
> > just use one of the standard CPU models and then explicitly tell
> > libvirt which flags to turn off in order to get compatibility with
> > our cloud environments.
> > 
> > This is not currently possible with Nova, since our nova.conf option
> > only allow us to specify a bare CPU model. We would have to extend
> > nova.conf to allow us to specify a list of CPU features to add or
> > remove. Libvirt should then correctly pass these changes through
> > to QEMU.
> Yes, that's an option. Given that the libvirt team seemed to acknowledge
> this as a regression, I'd rather not build a user exposed feature for
> all of that just as a workaround for a libvirt regression.

I think that fact that we're hitting this problem in the gate though is
a sign that our users will likely hit it in their own deployments if
using virtualized hosts. I think it is more friendly for users to be
able to customize the CPU features via nova.conf, then to repeat the
hacks done for devstack with editing the libvirt cpu_map.xml file.

IOW, extending nova.conf to support this officially would be a generally
useful feature for nova, beyond your short term CI needs.

|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

More information about the OpenStack-dev mailing list