Summary ------- RHEL-9 / CentOS-9 (but not Fedora) has switched[1] to a new baseline microarchitecture called "x86-64-v2". This is to bring in support for additioal low-level CPU instructions, among other reasons. Now, if you've explicitly configured "cpu_mode=none" in `nova.conf` on your compute nodes — which results in the guest getting the extremely undesirable "qemu64" CPU model — it will refuse to boot RHEL-9 or CentOS-9 guests. To fix this, please update the CPU model to "Nehalem". It is the oldest CPU model that is compatible with CentOS-9/RHEL-9 "x86-64-v2". Further, Nehalem also works with `virt_type=kvm|qemu`, _and_ on both Intel and AMD hardware. So this is a good alternative. Details ------- Nova has three config attributes to setup various aspect of a guest CPU: `cpu_mode`, `cpu_model[s]`, and `cpu_model_extra_flags`. Examples of how to use these are in the documentation[2]. If you're using `cpu_mode = none` (e.g. upstream DevStack defalts to it for understandable reasons, mainly live-migration compatiblity): [libvirt] cpu_mode = none ... and want to boot CentOS-9, replace the above with the custom model, "Nehalem", which is the oldest CPU model that's compatible with the new x86-64-v2 baseline: [libvirt] cpu_mode = custom cpu_model = Nehalem The same applies if you're using "qemu64" or "kvm64" with, or without any custom CPU flags — i.e. use Nehalem. (Also, please refer to[3] for more fine-grained recommendations of guest CPU configuration. It's a long document, but a patient reader will be rewarded.) Why is "qemu64" model undesirable for production? ------------------------------------------------- For those wondering about it, a few reasons why `qemu64` CPU model is not at all desirable: (1) It is vulnerable to many of the Spectre and other side-channel security flaws. To see this in "action", you can launch a guest with 'qemu64' CPU model, and then run the below: $ cd /sys/devices/system/cpu/vulnerabilities/ $ grep . * l1tf:Mitigation: PTE Inversion mds:Vulnerable: ... no microcode; SMT Host state unknown meltdown:Mitigation: PTI spec_store_bypass:Vulnerable spectre_v1:Mitigation: usercopy/swapgs barriers ... spectre_v2:Mitigation: Full generic retpoline ... Notice the "Vulnerable" entries. (2) "qemu64" does not support several critical CPU features: (a) AES (Advanced Encryption Standard) instruction, which is important for imporved TLS performance and encryption. (b) RDRAND instruction: without this, guests can get starved for entropy. (c) PCID flag: an obscure-but-important flag that'll lower the performance degradation that you incur from the "Meltdown" security fixes. Probably there are more reasons that I don't know of. An understandable reason why CI systems running in a cloud environment go with 'qemu64' is convenience: with 'qemu64', you can live-migrate a guest regardless of its underlying hardware (whether it's Intel or AMD). That's one main reason why upstream DevStack defaults to it. * * * Overall, the thumb-rule here is to either always explicitly specify a "sane" CPU model, based on the recommendations here[3]. Or to use Nova/libvirt's default ("host-model"). [1] https://developers.redhat.com/blog/2021/01/05/building-red-hat-enterprise-li... [2] https://docs.openstack.org/nova/latest/configuration/config.html#libvirt.cpu... [3] https://www.qemu.org/docs/master/system/i386/cpu.html#recommendations-for-kv... [4] https://opendev.org/openstack/whitebox-tempest-plugin/src/branch/master/.zuu... -- /kashyap