On Tue, 2024-05-14 at 23:37 +0200, Michel Jouvin wrote:
Hi Andrew,
Thanks for your answer. It seems definitely the information I was looking for. CPU capability comparison seems to remain buggy in Antelope and skip_cpu_compare_on_dest is one workaround that seems to work.... yes and no.
the old api didn not take into account the emulation cablalities of qemu when doing the comparison it was just comparing the host cpu feature without filtering the feature that are not expsoed to the guest so the old api would reject migrations that shoudl have been valid antelop is using the new api becasue the livbrit developers told use we should be using the new api. when you set skip_cpu_compare_on_dest you are turning off all check in nova meaning your live migration may fail much later when libvirt does the same check. libvirt will actually look at the exact feature set used by the qemu instance which should be the same as it returns to us in the new api. so if anything the more relaxed check in antelope should be less likely to reject a migration then how it worked previously. skip_cpu_compare_on_dest is really there to workaround bugs in libvirt as an escape hatch. i.e. you can turn of the chekc in pre-live migration which is there to ensure we dont have late failrue when we call libvirt but the libvirt check might still prevent the migration.
Cheers,
Michel
Le 14/05/2024 à 09:00, Andrew Bonney a écrit :
This may be the same issue as https://bugs.launchpad.net/nova/+bug/2039803. The last comment links to a couple of configuration workarounds, in particular https://docs.openstack.org/nova/latest/configuration/config.html#workarounds... which may help you until a fix is available.
------------------------------------------------------------------------ *From:* Michel Jouvin <michel.jouvin@ijclab.in2p3.fr> *Sent:* Monday, May 13, 2024 21:45 *To:* Openstack-discuss <openstack-discuss@lists.openstack.org> *Subject:* Re: Antelope compute: failure to live migrate VMs from Yoga compute External: Think before clicking
Hi,
I'd appreciate any feedback about the issue we are facing, either with advices on how to troubleshoot more precisely the problem or wih a possible solution if the problem has aleady been seen...
Thank you in advance. Best regards,
Michel
Le 12/05/2024 à 22:28, Michel Jouvin a écrit :
Hi,
In a Yoga cloud, I upgraded all the core services and 1 compute to Antelope (directly from Yoga, using a SLURP upgrade). This cloud has 4 compute servers, all using the same HW (Dell FX2/FC630 with Broadwell CPUs).
After the upgrade, any attempt to live migrate a VM from a Yoga compute to the Antelope compute fails with the following error in destination/Antelope nova-compute.log:
nova.exception.MigrationPreCheckError: Migration pre-check error: Unacceptable CPU info: CPU doesn't have compatibility.
This looks weird. I don't see anything in the release notes related to this problem, except a mention that the API used to do the comparison has changed ("This change replaces the usage of older API, compareCPU(), with the new one, compareHypervisorCPU().").
"virsh capabilities" reports the same thing for a Yoga and an Antelope compute, except the addition of a feature 'flush-l1d' in Antelope and the addition of 2 new attributes in the '<cpu>' section:
<signature family='6' model='79' stepping='1'/> <maxphysaddr mode='emulate' bits='46'/>
I tend to think that additions are not a problem in general and so it doesn't seem to explain the problem I'm facing. Anybody with a similar experience? It seems to me a different problem that the one that has been faced in the past (before Nova 20 if I'm right) and described in several posts, with CPU model improperly detected (in particular Broadwell vs. Icelake).
Thanks in advance for any hint/suggestion. Best regards,
Michel