Re: Nova live migration fails - InvalidCPUInfo: Unacceptable CPU info: CPU doesn't have compatibility.
I've restarted the instances that I'm using to test but no luck. I have the setting set on just the highest compute node; I tried setting it on one of my older gen compute nodes as well but because there is production load on it, I am anxious about restarting services; I did restart the nova-compute service after making the change and there was no change in behavior. Should the setting also be set on the controllers? When I look at virsh capabilities there is certainly a difference between the two 7c7 < <model>Opteron_G5</model> ---
<model>Opteron_G4</model>
9c9 < <microcode version='100665426'/> ---
<microcode version='100664894'/>
15d14 < <feature name='bmi1'/> 25c24 < <feature name='tce'/> ---
<feature name='lwp'/>
I look at the test instance on the compute node with the G5 opteron and this is in the XML <cpu mode='custom' match='exact' check='full'> <model fallback='forbid'>Opteron_G4</model> <topology sockets='1' cores='1' threads='1'/> <feature policy='require' name='vme'/> <feature policy='require' name='x2apic'/> <feature policy='require' name='hypervisor'/> <feature policy='disable' name='rdtscp'/> </cpu> This looks like it is correctly taking the CPU model that I have specified in nova.conf, yet it does not want to migrate to the Opteron_G4 node.
On Fri, Dec 14, 2018 at 10:38 AM Torin Woltjer <torin.woltjer@granddial.com> wrote:
I've restarted the instances that I'm using to test but no luck.
I have the setting set on just the highest compute node; I tried setting it on one of my older gen compute nodes as well but because there is production load on it, I am anxious about restarting services; I did restart the nova-compute service after making the change and there was no change in behavior. Should the setting also be set on the controllers?
I don't believe you need it on the controllers. The host capabilities get registered by nova-compute, so the controller config wouldn't make use of those settings. I wouldn't have different CPU modes though. If you're going to hard set one, you should hard-set them all. Just the difference between custom and passthrough may be enough to make nova reject the request. You should be able to easily set the custom setting on the production node and restart nova-compute without affecting running instances. Once the instances are running, nova has very little to do with them. As mentioned before, all the settings are applied at boot. The only caveat to this I can think of is if you're running NFS shared storage and nova services are in docker containers. I have seen this cause root disks of instances to go read-only as the mounting of the NFS volume comes and goes with the container. I've done exactly this multiple times with ceph-backed instances running on both baremetal and containerized hosts with no issues.
When I look at virsh capabilities there is certainly a difference between the two 7c7 < <model>Opteron_G5</model> ---
<model>Opteron_G4</model>
9c9 < <microcode version='100665426'/> ---
<microcode version='100664894'/>
15d14 < <feature name='bmi1'/> 25c24 < <feature name='tce'/> ---
<feature name='lwp'/>
I look at the test instance on the compute node with the G5 opteron and this is in the XML <cpu mode='custom' match='exact' check='full'> <model fallback='forbid'>Opteron_G4</model> <topology sockets='1' cores='1' threads='1'/> <feature policy='require' name='vme'/> <feature policy='require' name='x2apic'/> <feature policy='require' name='hypervisor'/> <feature policy='disable' name='rdtscp'/> </cpu>
This looks like it is correctly taking the CPU model that I have specified in nova.conf, yet it does not want to migrate to the Opteron_G4 node.
On 12/14/2018 9:38 AM, Torin Woltjer wrote:
I've restarted the instances that I'm using to test but no luck.
I have the setting set on just the highest compute node; I tried setting it on one of my older gen compute nodes as well but because there is production load on it, I am anxious about restarting services; I did restart the nova-compute service after making the change and there was no change in behavior. Should the setting also be set on the controllers?
You should set the model explicitly on all the nodes, you'd only need to restart nova-compute on that node after changing the setting. It might be worth booting up an instance on the G4 node after making this change and then checking whether the XML matches what you have on the G5 node. Chris
participants (3)
-
Chris Friesen
-
Erik McCormick
-
Torin Woltjer