[Openstack-operators] Puzzling issue: Unacceptable CPU info: CPU doesn't have compatibility

Aubrey Wells awells at digiumcloud.com
Fri Jul 17 13:10:15 UTC 2015


I ran into the different core count thing a while back too and its not
fixed in Kilo (that's where I discovered it). I posted to the mailing list
and didn't get any feedback on it, but as I was just looking in the
archives to send you the link to the hack I found to fix it, I noticed that
it silently failed to post to the mailing list. I'll add the text of my
email below, maybe someone will have some ideas. Original message follows.

=======

Greetings,
Trying to decide if this is a bug or just a config option that I can't
find. The setup I'm currently testing in my lab with is two compute nodes
running Kilo, one has 40 cores (2x 10c with HT) and one has 16 cores (2x 4c
+ HT). I don't have any CPU pinning enabled in my nova config, which seems
to have the effect of setting in libvirt.xml a vcpu cpuset element like (if
created on the 40c node):

<vcpu
cpuset="1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39">1</vcpu>

And then if I migrate that instance to the 16c node, it will bomb out with
an exception:

Live Migration failure: Invalid value
'0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38' for 'cpuset.cpus':
Invalid argument

Which makes sense, since that node doesn't have any vcpus after 15 (0-15).

I can fix the symptom by commenting out a line in
nova/virt/libvirt/config.py (circa line 1831) so it always has an empty
cpuset and thus doesn't write that line to libvirt.xml:
# vcpu.set("cpuset", hardware.format_cpu_spec(self.cpuset))

And the instance will happily migrate to the host with less CPUs, but this
loses some of the benefit of openstack trying to evenly spread out the
core usage
on the host, at least that's what I think the purpose of that is.

I'd rather fix it the right way if there's a config option I don't see or
file a bug if its a bug.

What I think should be happening is that when it creates the libvirt
definition on the destination compute node, it write out the correct cpuset
per the specs of the hardware its going on to.

If it matters, in my nova-compute.conf file, I also have cpu mode and model
defined to allow me to migrate between the two different architectures to
begin with (the 40c is Sandybridge and the 16c is Westmere so I set it to
the lowest common denominator of Westmere):

cpu_mode=custom
cpu_model=Westmere

Any help is appreciated.



On Fri, Jul 17, 2015 at 8:58 AM, David Medberry <openstack at medberry.net>
wrote:

> HI Daniel,
>
> Yep found that all out.
>
> Now I'm struggling through the NUMA mismatch. NUMA as there are two cpus.
> The old CPU was a 10 core 20 thread thus 40 "cpus", {0-9,20-29} and then
> {10-19,30-39} on the other cell. The new CPU is a 12 core 24 thread.
> Apparently even in kilo, this results in a mismatch if I'm running a 2 VCPU
> guest and trying to migrate from new to old. I suspect I have to disable
> NUMA somehow (filter, etc) but it is entirely non-obvious. And of course
> I'm doing this again in OpenStack nova (not direct libvirt) so I'm going to
> do a bit more research and then file a new bug. This also may be fixed in
> Kilo but I"m not finding it (and it may be fixed in Liberty already and
> just need a backport.)
>
> My apologies for not following up to the list once I found the Kilo
> solution to the original problem.
>
> On Fri, Jul 17, 2015 at 6:10 AM, Daniel P. Berrange <berrange at redhat.com>
> wrote:
>
>> On Fri, Jul 17, 2015 at 01:07:56PM +0100, Daniel P. Berrange wrote:
>> > On Thu, Jul 09, 2015 at 12:00:15PM -0600, David Medberry wrote:
>> > > Hi,
>> > >
>> > > When trying to live-migrate between two distinct CPUs, I kind of
>> expect
>> > > there to be issues. Which is why openstack supports the
>> "cpu_mode=custom",
>> > > "cpu_model=MODELNAME" flags for libvirt.
>> > >
>> > > When I set those to some Lowest Common Denominator (and restart
>> > > everything), I still git the issue. I've set both systems to
>> SandyBridge
>> > > and tested as well as Conroe. The actual CPUs are Ivy Bridge and
>> Haswell
>> > > (newer than SandyBridge and supersets thereof.)
>> > >
>> > > The Older->Newer migration works fine (even without setting a
>> cpu_model)
>> > > but the newer to older never works.
>> > >
>> > > Specfics:
>> > > OpenStack Juno.2
>> > > LibVirt: 1.2.2
>> > >
>> > > Older: model name : Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz (Ivy
>> Bridge)
>> > > Newer: model name : Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz
>> (Haswell)
>> > >
>> > > Daniel, Operators: Any ideas?
>> >
>> > In versions of Nova prior to Liberty, nova did an incorrect CPU model
>> > comparison. It checks the source *host* CPU model against the dest
>> > host CPU model, instead of checking the *guest* CPU model against the
>> > dest host CPU model.
>> >
>> > This is fixed in Liberty, provided you have the cpu_mode=custom and
>> > cpu_modelk=MODELNAME parameters set. Unfortunately the fix will only
>> > work for guests that are launched under Liberty codebase as it needed
>> > a database addition. So if you have existing running guests from Juno
>> > those need restarting after upgrade.
>>
>> Sigh,  s/Liberty/Kilo/ in everything I wrote here
>>
>> Regards,
>> Daniel
>> --
>> |: http://berrange.com      -o-
>> http://www.flickr.com/photos/dberrange/ :|
>> |: http://libvirt.org              -o-
>> http://virt-manager.org :|
>> |: http://autobuild.org       -o-
>> http://search.cpan.org/~danberr/ :|
>> |: http://entangle-photo.org       -o-
>> http://live.gnome.org/gtk-vnc :|
>>
>
>
> _______________________________________________
> OpenStack-operators mailing list
> OpenStack-operators at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-operators/attachments/20150717/356b69f3/attachment.html>


More information about the OpenStack-operators mailing list