[Openstack-operators] Puzzling issue: Unacceptable CPU info: CPU doesn't have compatibility

David Medberry openstack at medberry.net
Fri Jul 17 13:21:16 UTC 2015


Hi Aubrey,

I'm actually wondering if this is a new regression bug INTRODUCED in Kilo
(as part of the NUMA work). I'll be testing that a bit too by altering my
Juno architecture a bit (monkeying with kernel MAXCPUS to see if I can get
into a similar situation in Juno but with identical hardware.)

The best info I have found so far is Daniel's howto (in the openstack docs)
for creating a test scenario for numa:

http://docs.openstack.org/developer/nova/devref/testing/libvirt-numa.html
(and related pages)

On Fri, Jul 17, 2015 at 7:10 AM, Aubrey Wells <awells at digiumcloud.com>
wrote:

> I ran into the different core count thing a while back too and its not
> fixed in Kilo (that's where I discovered it). I posted to the mailing list
> and didn't get any feedback on it, but as I was just looking in the
> archives to send you the link to the hack I found to fix it, I noticed that
> it silently failed to post to the mailing list. I'll add the text of my
> email below, maybe someone will have some ideas. Original message follows.
>
> =======
>
> Greetings,
> Trying to decide if this is a bug or just a config option that I can't
> find. The setup I'm currently testing in my lab with is two compute nodes
> running Kilo, one has 40 cores (2x 10c with HT) and one has 16 cores (2x 4c
> + HT). I don't have any CPU pinning enabled in my nova config, which seems
> to have the effect of setting in libvirt.xml a vcpu cpuset element like (if
> created on the 40c node):
>
> <vcpu
> cpuset="1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39">1</vcpu>
>
> And then if I migrate that instance to the 16c node, it will bomb out with
> an exception:
>
> Live Migration failure: Invalid value
> '0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38' for 'cpuset.cpus':
> Invalid argument
>
> Which makes sense, since that node doesn't have any vcpus after 15 (0-15).
>
> I can fix the symptom by commenting out a line in
> nova/virt/libvirt/config.py (circa line 1831) so it always has an empty
> cpuset and thus doesn't write that line to libvirt.xml:
> # vcpu.set("cpuset", hardware.format_cpu_spec(self.cpuset))
>
> And the instance will happily migrate to the host with less CPUs, but this
> loses some of the benefit of openstack trying to evenly spread out the
> core usage on the host, at least that's what I think the purpose of that
> is.
>
> I'd rather fix it the right way if there's a config option I don't see or
> file a bug if its a bug.
>
> What I think should be happening is that when it creates the libvirt
> definition on the destination compute node, it write out the correct cpuset
> per the specs of the hardware its going on to.
>
> If it matters, in my nova-compute.conf file, I also have cpu mode and
> model defined to allow me to migrate between the two different
> architectures to begin with (the 40c is Sandybridge and the 16c is Westmere
> so I set it to the lowest common denominator of Westmere):
>
> cpu_mode=custom
> cpu_model=Westmere
>
> Any help is appreciated.
>
>
>
> On Fri, Jul 17, 2015 at 8:58 AM, David Medberry <openstack at medberry.net>
> wrote:
>
>> HI Daniel,
>>
>> Yep found that all out.
>>
>> Now I'm struggling through the NUMA mismatch. NUMA as there are two cpus.
>> The old CPU was a 10 core 20 thread thus 40 "cpus", {0-9,20-29} and then
>> {10-19,30-39} on the other cell. The new CPU is a 12 core 24 thread.
>> Apparently even in kilo, this results in a mismatch if I'm running a 2 VCPU
>> guest and trying to migrate from new to old. I suspect I have to disable
>> NUMA somehow (filter, etc) but it is entirely non-obvious. And of course
>> I'm doing this again in OpenStack nova (not direct libvirt) so I'm going to
>> do a bit more research and then file a new bug. This also may be fixed in
>> Kilo but I"m not finding it (and it may be fixed in Liberty already and
>> just need a backport.)
>>
>> My apologies for not following up to the list once I found the Kilo
>> solution to the original problem.
>>
>> On Fri, Jul 17, 2015 at 6:10 AM, Daniel P. Berrange <berrange at redhat.com>
>> wrote:
>>
>>> On Fri, Jul 17, 2015 at 01:07:56PM +0100, Daniel P. Berrange wrote:
>>> > On Thu, Jul 09, 2015 at 12:00:15PM -0600, David Medberry wrote:
>>> > > Hi,
>>> > >
>>> > > When trying to live-migrate between two distinct CPUs, I kind of
>>> expect
>>> > > there to be issues. Which is why openstack supports the
>>> "cpu_mode=custom",
>>> > > "cpu_model=MODELNAME" flags for libvirt.
>>> > >
>>> > > When I set those to some Lowest Common Denominator (and restart
>>> > > everything), I still git the issue. I've set both systems to
>>> SandyBridge
>>> > > and tested as well as Conroe. The actual CPUs are Ivy Bridge and
>>> Haswell
>>> > > (newer than SandyBridge and supersets thereof.)
>>> > >
>>> > > The Older->Newer migration works fine (even without setting a
>>> cpu_model)
>>> > > but the newer to older never works.
>>> > >
>>> > > Specfics:
>>> > > OpenStack Juno.2
>>> > > LibVirt: 1.2.2
>>> > >
>>> > > Older: model name : Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz (Ivy
>>> Bridge)
>>> > > Newer: model name : Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz
>>> (Haswell)
>>> > >
>>> > > Daniel, Operators: Any ideas?
>>> >
>>> > In versions of Nova prior to Liberty, nova did an incorrect CPU model
>>> > comparison. It checks the source *host* CPU model against the dest
>>> > host CPU model, instead of checking the *guest* CPU model against the
>>> > dest host CPU model.
>>> >
>>> > This is fixed in Liberty, provided you have the cpu_mode=custom and
>>> > cpu_modelk=MODELNAME parameters set. Unfortunately the fix will only
>>> > work for guests that are launched under Liberty codebase as it needed
>>> > a database addition. So if you have existing running guests from Juno
>>> > those need restarting after upgrade.
>>>
>>> Sigh,  s/Liberty/Kilo/ in everything I wrote here
>>>
>>> Regards,
>>> Daniel
>>> --
>>> |: http://berrange.com      -o-
>>> http://www.flickr.com/photos/dberrange/ :|
>>> |: http://libvirt.org              -o-
>>> http://virt-manager.org :|
>>> |: http://autobuild.org       -o-
>>> http://search.cpan.org/~danberr/ :|
>>> |: http://entangle-photo.org       -o-
>>> http://live.gnome.org/gtk-vnc :|
>>>
>>
>>
>> _______________________________________________
>> OpenStack-operators mailing list
>> OpenStack-operators at lists.openstack.org
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-operators/attachments/20150717/c1e37755/attachment.html>


More information about the OpenStack-operators mailing list