<div dir="ltr">I ran into the different core count thing a while back too and its not fixed in Kilo (that's where I discovered it). I posted to the mailing list and didn't get any feedback on it, but as I was just looking in the archives to send you the link to the hack I found to fix it, I noticed that it silently failed to post to the mailing list. I'll add the text of my email below, maybe someone will have some ideas. Original message follows.<div><br></div><div>=======</div><div class="gmail_extra"><br></div><div class="gmail_extra"><span style="font-size:12.8000001907349px">Greetings,</span><div style="font-size:12.8000001907349px">Trying to decide if this is a bug or just a config option that I can't find. The setup I'm currently testing in my lab with is two compute nodes running Kilo, one has 40 cores (2x 10c with HT) and one has 16 cores (2x 4c + HT). I don't have any CPU pinning enabled in my nova config, which seems to have the effect of setting in libvirt.xml a vcpu cpuset element like (if created on the 40c node):</div><div style="font-size:12.8000001907349px"><br></div><div style="font-size:12.8000001907349px"><vcpu cpuset="1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39">1</vcpu><br clear="all"><div><div dir="ltr"><br></div><div>And then if I migrate that instance to the 16c node, it will bomb out with an exception:</div><div><br></div><div>Live Migration failure: Invalid value '0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38' for 'cpuset.cpus': Invalid argument<br></div><div><br></div><div>Which makes sense, since that node doesn't have any vcpus after 15 (0-15).</div><div dir="ltr"><br></div><div dir="ltr">I can fix the symptom by commenting out a line in nova/virt/libvirt/config.py (circa line 1831) so it always has an empty cpuset and thus doesn't write that line to libvirt.xml:</div><div dir="ltr"># vcpu.set("cpuset", hardware.format_cpu_spec(self.cpuset))<br></div><div dir="ltr"><br></div><div>And the instance will happily migrate to the host with less CPUs, but this loses some of the benefit of openstack trying to evenly spread out the <span class="">core</span> usage on the host, at least that's what I think the purpose of that is.</div><div dir="ltr"><br></div><div>I'd rather fix it the right way if there's a config option I don't see or file a bug if its a bug.</div><div><br></div><div>What I think should be happening is that when it creates the libvirt definition on the destination compute node, it write out the correct cpuset per the specs of the hardware its going on to.</div><div><br></div><div>If it matters, in my nova-compute.conf file, I also have cpu mode and model defined to allow me to migrate between the two different architectures to begin with (the 40c is Sandybridge and the 16c is Westmere so I set it to the lowest common denominator of Westmere):</div><div><br></div><div><div>cpu_mode=custom</div><div>cpu_model=Westmere</div></div><div dir="ltr"><br></div><div>Any help is appreciated.</div></div></div><div><div class="gmail_signature"><div dir="ltr"><br><div><br></div></div></div></div>
<br><div class="gmail_quote">On Fri, Jul 17, 2015 at 8:58 AM, David Medberry <span dir="ltr"><<a href="mailto:openstack@medberry.net" target="_blank">openstack@medberry.net</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div dir="ltr"><span style="font-size:12.8000001907349px">HI Daniel,</span><div style="font-size:12.8000001907349px"><br></div><div style="font-size:12.8000001907349px">Yep found that all out.</div><div style="font-size:12.8000001907349px"><br></div><div style="font-size:12.8000001907349px">Now I'm struggling through the NUMA mismatch. NUMA as there are two cpus. The old <span>CPU</span> was a 10 core 20 thread thus 40 "cpus", {0-9,20-29} and then {10-19,30-39} on the other cell. The new <span>CPU</span> is a 12 core 24 thread. Apparently even in kilo, this results in a mismatch if I'm running a 2 VCPU guest and trying to migrate from new to old. I suspect I <span>have</span> to disable NUMA somehow (filter, etc) but it is entirely non-obvious. And of course I'm doing this again in OpenStack nova (not direct libvirt) so I'm going to do a bit more research and then file a new bug. This also may be fixed in Kilo but I"m not finding it (and it may be fixed in Liberty already and just need a backport.)</div><div style="font-size:12.8000001907349px"><br></div><div style="font-size:12.8000001907349px">My apologies for not following up to the list once I found the Kilo solution to the original problem.</div></div><div class="gmail_extra"><br><div class="gmail_quote"><div><div class="h5">On Fri, Jul 17, 2015 at 6:10 AM, Daniel P. Berrange <span dir="ltr"><<a href="mailto:berrange@redhat.com" target="_blank">berrange@redhat.com</a>></span> wrote:<br></div></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div><div class="h5"><div><div>On Fri, Jul 17, 2015 at 01:07:56PM +0100, Daniel P. Berrange wrote:<br>
> On Thu, Jul 09, 2015 at 12:00:15PM -0600, David Medberry wrote:<br>
> > Hi,<br>
> ><br>
> > When trying to live-migrate between two distinct CPUs, I kind of expect<br>
> > there to be issues. Which is why openstack supports the "cpu_mode=custom",<br>
> > "cpu_model=MODELNAME" flags for libvirt.<br>
> ><br>
> > When I set those to some Lowest Common Denominator (and restart<br>
> > everything), I still git the issue. I've set both systems to SandyBridge<br>
> > and tested as well as Conroe. The actual CPUs are Ivy Bridge and Haswell<br>
> > (newer than SandyBridge and supersets thereof.)<br>
> ><br>
> > The Older->Newer migration works fine (even without setting a cpu_model)<br>
> > but the newer to older never works.<br>
> ><br>
> > Specfics:<br>
> > OpenStack Juno.2<br>
> > LibVirt: 1.2.2<br>
> ><br>
> > Older: model name : Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz (Ivy Bridge)<br>
> > Newer: model name : Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz (Haswell)<br>
> ><br>
> > Daniel, Operators: Any ideas?<br>
><br>
> In versions of Nova prior to Liberty, nova did an incorrect CPU model<br>
> comparison. It checks the source *host* CPU model against the dest<br>
> host CPU model, instead of checking the *guest* CPU model against the<br>
> dest host CPU model.<br>
><br>
> This is fixed in Liberty, provided you have the cpu_mode=custom and<br>
> cpu_modelk=MODELNAME parameters set. Unfortunately the fix will only<br>
> work for guests that are launched under Liberty codebase as it needed<br>
> a database addition. So if you have existing running guests from Juno<br>
> those need restarting after upgrade.<br>
<br>
</div></div>Sigh, s/Liberty/Kilo/ in everything I wrote here<br>
</div></div><div><div><br>
Regards,<br>
Daniel<span class=""><font color="#888888"><br>
--<br>
|: <a href="http://berrange.com" rel="noreferrer" target="_blank">http://berrange.com</a> -o- <a href="http://www.flickr.com/photos/dberrange/" rel="noreferrer" target="_blank">http://www.flickr.com/photos/dberrange/</a> :|<br>
|: <a href="http://libvirt.org" rel="noreferrer" target="_blank">http://libvirt.org</a> -o- <a href="http://virt-manager.org" rel="noreferrer" target="_blank">http://virt-manager.org</a> :|<br>
|: <a href="http://autobuild.org" rel="noreferrer" target="_blank">http://autobuild.org</a> -o- <a href="http://search.cpan.org/~danberr/" rel="noreferrer" target="_blank">http://search.cpan.org/~danberr/</a> :|<br>
|: <a href="http://entangle-photo.org" rel="noreferrer" target="_blank">http://entangle-photo.org</a> -o- <a href="http://live.gnome.org/gtk-vnc" rel="noreferrer" target="_blank">http://live.gnome.org/gtk-vnc</a> :|<br>
</font></span></div></div></blockquote></div><br></div>
<br>_______________________________________________<br>
OpenStack-operators mailing list<br>
<a href="mailto:OpenStack-operators@lists.openstack.org">OpenStack-operators@lists.openstack.org</a><br>
<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators" rel="noreferrer" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators</a><br>
<br></blockquote></div><br></div></div>