[Openstack-operators] Migrating an instance to a host with less cores fails

Steve Gordon sgordon at redhat.com
Fri Sep 25 18:34:24 UTC 2015


----- Original Message -----
> From: "Kris G. Lindgren" <klindgren at godaddy.com>
> To: "Steve Gordon" <sgordon at redhat.com>, "Aubrey Wells" <awells at digiumcloud.com>, "David Medberry"
> 
> I believe TWC - (medberry on irc) was lamenting to me about cpusets,
> different hypervisors HW configs, and unassigned vcpu's in numa nodes.
> 
> The problem is the migration does not re-define the domain.xml, specifically,
> the vcpu mapping to match what makes sense on the new host.  I believe the
> issue is more pronounced when you go from a compute node with more cores to
> a compute node with less cores. I believe the opposite migration works, just
> the vcpu/numa nodes are all wrong.
> 
> CC'ing him as well.

Nikola's reply got bounced because he isn't subscribed, but:

"""
Thanks Steve!

So the below is likely the same root cause as this bug:

https://launchpad.net/bugs/1461777

Which has been fixed in Liberty and backported to stable/kilo (see
https://review.openstack.org/#/c/191594/)

Updating your lab to the latest stable/kilo release (2015.1.1) will
likely fix the problem for you.

Let me know if this helps!

Thanks,
N.
"""

> On 9/25/15, 11:53 AM, "Steve Gordon" <sgordon at redhat.com> wrote:
> 
> >Adding Nikola as he has been working on this.
> >
> >----- Original Message -----
> >> From: "Aubrey Wells" <awells at digiumcloud.com>
> >> To: openstack-operators at lists.openstack.org
> >> 
> >> Greetings,
> >> Trying to decide if this is a bug or just a config option that I can't
> >> find. The setup I'm currently testing in my lab with is two compute nodes
> >> running Kilo, one has 40 cores (2x 10c with HT) and one has 16 cores (2x
> >> 4c
> >> + HT). I don't have any CPU pinning enabled in my nova config, which seems
> >> to have the effect of setting in libvirt.xml a vcpu cpuset element like
> >> (if
> >> created on the 40c node):
> >> 
> >> <vcpu
> >> cpuset="1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39">1</vcpu>
> >> 
> >> And then if I migrate that instance to the 16c node, it will bomb out with
> >> an exception:
> >> 
> >> Live Migration failure: Invalid value
> >> '0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38' for
> >> 'cpuset.cpus':
> >> Invalid argument
> >> 
> >> Which makes sense, since that node doesn't have any vcpus after 15 (0-15).
> >> 
> >> I can fix the symptom by commenting out a line in
> >> nova/virt/libvirt/config.py (circa line 1831) so it always has an empty
> >> cpuset and thus doesn't write that line to libvirt.xml:
> >> # vcpu.set("cpuset", hardware.format_cpu_spec(self.cpuset))
> >> 
> >> And the instance will happily migrate to the host with less CPUs, but this
> >> loses some of the benefit of openstack trying to evenly spread out the
> >> core
> >> usage on the host, at least that's what I think the purpose of that is.
> >> 
> >> I'd rather fix it the right way if there's a config option I don't see or
> >> file a bug if its a bug.
> >> 
> >> What I think should be happening is that when it creates the libvirt
> >> definition on the destination compute node, it write out the correct
> >> cpuset
> >> per the specs of the hardware its going on to.
> >> 
> >> If it matters, in my nova-compute.conf file, I also have cpu mode and
> >> model
> >> defined to allow me to migrate between the two different architectures to
> >> begin with (the 40c is Sandybridge and the 16c is Westmere so I set it to
> >> the lowest common denominator of Westmere):
> >> 
> >> cpu_mode=custom
> >> cpu_model=Westmere
> >> 
> >> Any help is appreciated.
> >> 
> >> ---------------------
> >> Aubrey
> >> 
> >> _______________________________________________
> >> OpenStack-operators mailing list
> >> OpenStack-operators at lists.openstack.org
> >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
> >> 
> >
> >--
> >Steve Gordon, RHCE
> >Sr. Technical Product Manager,
> >Red Hat Enterprise Linux OpenStack Platform
> >
> >_______________________________________________
> >OpenStack-operators mailing list
> >OpenStack-operators at lists.openstack.org
> >http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
> 

-- 
Steve Gordon, RHCE
Sr. Technical Product Manager,
Red Hat Enterprise Linux OpenStack Platform



More information about the OpenStack-operators mailing list