CentOS-9 guests & 'qemu64' CPU model are incompatible; and reasons to avoid 'qemu64' in general
Kashyap Chamarthy
kchamart at redhat.com
Thu Oct 21 17:49:34 UTC 2021
Summary
-------
RHEL-9 / CentOS-9 (but not Fedora) has switched[1] to a new baseline
microarchitecture called "x86-64-v2". This is to bring in support for
additioal low-level CPU instructions, among other reasons. Now, if
you've explicitly configured "cpu_mode=none" in `nova.conf` on your
compute nodes — which results in the guest getting the extremely
undesirable "qemu64" CPU model — it will refuse to boot RHEL-9 or CentOS-9
guests.
To fix this, please update the CPU model to "Nehalem". It is the oldest
CPU model that is compatible with CentOS-9/RHEL-9 "x86-64-v2". Further,
Nehalem also works with `virt_type=kvm|qemu`, _and_ on both Intel and
AMD hardware. So this is a good alternative.
Details
-------
Nova has three config attributes to setup various aspect of a guest CPU:
`cpu_mode`, `cpu_model[s]`, and `cpu_model_extra_flags`. Examples of
how to use these are in the documentation[2]. If you're using `cpu_mode
= none` (e.g. upstream DevStack defalts to it for understandable
reasons, mainly live-migration compatiblity):
[libvirt]
cpu_mode = none
... and want to boot CentOS-9, replace the above with the custom model,
"Nehalem", which is the oldest CPU model that's compatible with the new
x86-64-v2 baseline:
[libvirt]
cpu_mode = custom
cpu_model = Nehalem
The same applies if you're using "qemu64" or "kvm64" with, or without
any custom CPU flags — i.e. use Nehalem. (Also, please refer to[3] for
more fine-grained recommendations of guest CPU configuration. It's a
long document, but a patient reader will be rewarded.)
Why is "qemu64" model undesirable for production?
-------------------------------------------------
For those wondering about it, a few reasons why `qemu64` CPU model is
not at all desirable:
(1) It is vulnerable to many of the Spectre and other side-channel
security flaws. To see this in "action", you can launch a guest
with 'qemu64' CPU model, and then run the below:
$ cd /sys/devices/system/cpu/vulnerabilities/
$ grep . *
l1tf:Mitigation: PTE Inversion
mds:Vulnerable: ... no microcode; SMT Host state unknown
meltdown:Mitigation: PTI
spec_store_bypass:Vulnerable
spectre_v1:Mitigation: usercopy/swapgs barriers ...
spectre_v2:Mitigation: Full generic retpoline ...
Notice the "Vulnerable" entries.
(2) "qemu64" does not support several critical CPU features:
(a) AES (Advanced Encryption Standard) instruction,
which is important for imporved TLS performance and encryption.
(b) RDRAND instruction: without this, guests can get starved for
entropy.
(c) PCID flag: an obscure-but-important flag that'll lower the
performance degradation that you incur from the "Meltdown"
security fixes.
Probably there are more reasons that I don't know of.
An understandable reason why CI systems running in a cloud environment
go with 'qemu64' is convenience: with 'qemu64', you can live-migrate a
guest regardless of its underlying hardware (whether it's Intel or AMD).
That's one main reason why upstream DevStack defaults to it.
* * *
Overall, the thumb-rule here is to either always explicitly specify a
"sane" CPU model, based on the recommendations here[3]. Or to use
Nova/libvirt's default ("host-model").
[1] https://developers.redhat.com/blog/2021/01/05/building-red-hat-enterprise-linux-9-for-the-x86-64-v2-microarchitecture-level
[2] https://docs.openstack.org/nova/latest/configuration/config.html#libvirt.cpu_mode
[3] https://www.qemu.org/docs/master/system/i386/cpu.html#recommendations-for-kvm-cpu-model-configuration-on-x86-hosts
[4] https://opendev.org/openstack/whitebox-tempest-plugin/src/branch/master/.zuul.yaml#L54
--
/kashyap
More information about the openstack-discuss
mailing list