CentOS-9 guests & 'qemu64' CPU model are incompatible; and reasons to avoid 'qemu64' in general
Summary ------- RHEL-9 / CentOS-9 (but not Fedora) has switched[1] to a new baseline microarchitecture called "x86-64-v2". This is to bring in support for additioal low-level CPU instructions, among other reasons. Now, if you've explicitly configured "cpu_mode=none" in `nova.conf` on your compute nodes — which results in the guest getting the extremely undesirable "qemu64" CPU model — it will refuse to boot RHEL-9 or CentOS-9 guests. To fix this, please update the CPU model to "Nehalem". It is the oldest CPU model that is compatible with CentOS-9/RHEL-9 "x86-64-v2". Further, Nehalem also works with `virt_type=kvm|qemu`, _and_ on both Intel and AMD hardware. So this is a good alternative. Details ------- Nova has three config attributes to setup various aspect of a guest CPU: `cpu_mode`, `cpu_model[s]`, and `cpu_model_extra_flags`. Examples of how to use these are in the documentation[2]. If you're using `cpu_mode = none` (e.g. upstream DevStack defalts to it for understandable reasons, mainly live-migration compatiblity): [libvirt] cpu_mode = none ... and want to boot CentOS-9, replace the above with the custom model, "Nehalem", which is the oldest CPU model that's compatible with the new x86-64-v2 baseline: [libvirt] cpu_mode = custom cpu_model = Nehalem The same applies if you're using "qemu64" or "kvm64" with, or without any custom CPU flags — i.e. use Nehalem. (Also, please refer to[3] for more fine-grained recommendations of guest CPU configuration. It's a long document, but a patient reader will be rewarded.) Why is "qemu64" model undesirable for production? ------------------------------------------------- For those wondering about it, a few reasons why `qemu64` CPU model is not at all desirable: (1) It is vulnerable to many of the Spectre and other side-channel security flaws. To see this in "action", you can launch a guest with 'qemu64' CPU model, and then run the below: $ cd /sys/devices/system/cpu/vulnerabilities/ $ grep . * l1tf:Mitigation: PTE Inversion mds:Vulnerable: ... no microcode; SMT Host state unknown meltdown:Mitigation: PTI spec_store_bypass:Vulnerable spectre_v1:Mitigation: usercopy/swapgs barriers ... spectre_v2:Mitigation: Full generic retpoline ... Notice the "Vulnerable" entries. (2) "qemu64" does not support several critical CPU features: (a) AES (Advanced Encryption Standard) instruction, which is important for imporved TLS performance and encryption. (b) RDRAND instruction: without this, guests can get starved for entropy. (c) PCID flag: an obscure-but-important flag that'll lower the performance degradation that you incur from the "Meltdown" security fixes. Probably there are more reasons that I don't know of. An understandable reason why CI systems running in a cloud environment go with 'qemu64' is convenience: with 'qemu64', you can live-migrate a guest regardless of its underlying hardware (whether it's Intel or AMD). That's one main reason why upstream DevStack defaults to it. * * * Overall, the thumb-rule here is to either always explicitly specify a "sane" CPU model, based on the recommendations here[3]. Or to use Nova/libvirt's default ("host-model"). [1] https://developers.redhat.com/blog/2021/01/05/building-red-hat-enterprise-li... [2] https://docs.openstack.org/nova/latest/configuration/config.html#libvirt.cpu... [3] https://www.qemu.org/docs/master/system/i386/cpu.html#recommendations-for-kv... [4] https://opendev.org/openstack/whitebox-tempest-plugin/src/branch/master/.zuu... -- /kashyap
On Thu, Oct 21, 2021, at 10:49 AM, Kashyap Chamarthy wrote:
Summary -------
RHEL-9 / CentOS-9 (but not Fedora) has switched[1] to a new baseline microarchitecture called "x86-64-v2". This is to bring in support for additioal low-level CPU instructions, among other reasons. Now, if you've explicitly configured "cpu_mode=none" in `nova.conf` on your compute nodes — which results in the guest getting the extremely undesirable "qemu64" CPU model — it will refuse to boot RHEL-9 or CentOS-9 guests.
To fix this, please update the CPU model to "Nehalem". It is the oldest CPU model that is compatible with CentOS-9/RHEL-9 "x86-64-v2". Further, Nehalem also works with `virt_type=kvm|qemu`, _and_ on both Intel and AMD hardware. So this is a good alternative.
Thank you for looking into this and providing such detailed information. It has been really helpful.
Details -------
Nova has three config attributes to setup various aspect of a guest CPU: `cpu_mode`, `cpu_model[s]`, and `cpu_model_extra_flags`. Examples of how to use these are in the documentation[2]. If you're using `cpu_mode = none` (e.g. upstream DevStack defalts to it for understandable reasons, mainly live-migration compatiblity):
[libvirt] cpu_mode = none
... and want to boot CentOS-9, replace the above with the custom model, "Nehalem", which is the oldest CPU model that's compatible with the new x86-64-v2 baseline:
[libvirt] cpu_mode = custom cpu_model = Nehalem
The same applies if you're using "qemu64" or "kvm64" with, or without any custom CPU flags — i.e. use Nehalem. (Also, please refer to[3] for more fine-grained recommendations of guest CPU configuration. It's a long document, but a patient reader will be rewarded.)
Why is "qemu64" model undesirable for production? -------------------------------------------------
For those wondering about it, a few reasons why `qemu64` CPU model is not at all desirable:
(1) It is vulnerable to many of the Spectre and other side-channel security flaws. To see this in "action", you can launch a guest with 'qemu64' CPU model, and then run the below:
$ cd /sys/devices/system/cpu/vulnerabilities/ $ grep . * l1tf:Mitigation: PTE Inversion mds:Vulnerable: ... no microcode; SMT Host state unknown meltdown:Mitigation: PTI spec_store_bypass:Vulnerable spectre_v1:Mitigation: usercopy/swapgs barriers ... spectre_v2:Mitigation: Full generic retpoline ...
Notice the "Vulnerable" entries.
(2) "qemu64" does not support several critical CPU features:
(a) AES (Advanced Encryption Standard) instruction, which is important for imporved TLS performance and encryption.
(b) RDRAND instruction: without this, guests can get starved for entropy.
(c) PCID flag: an obscure-but-important flag that'll lower the performance degradation that you incur from the "Meltdown" security fixes.
Probably there are more reasons that I don't know of.
An understandable reason why CI systems running in a cloud environment go with 'qemu64' is convenience: with 'qemu64', you can live-migrate a guest regardless of its underlying hardware (whether it's Intel or AMD). That's one main reason why upstream DevStack defaults to it.
I've got a change up to Devstack to convert it over to Nehalem by default [5]. So far it looks good, but we will want to recheck it a few times and make sure we have good test coverage across the clouds we run testing on just to be sure that the CPUs we get from those clouds are able to support this CPU type. Good news is that we successfully built a centos-9-stream image and booted it with the Nehalem change in place [6].
* * *
Overall, the thumb-rule here is to either always explicitly specify a "sane" CPU model, based on the recommendations here[3]. Or to use Nova/libvirt's default ("host-model").
Devstack is currently setting cpu_mode to none. Should Nova be updated to make this result in a better behavior? Is this literally not passing a cpu mode to libvirt/qemu and allowing them to choose a default? If so maybe libvirt/qemu need to update their defaults?
[1] https://developers.redhat.com/blog/2021/01/05/building-red-hat-enterprise-li... [2] https://docs.openstack.org/nova/latest/configuration/config.html#libvirt.cpu... [3] https://www.qemu.org/docs/master/system/i386/cpu.html#recommendations-for-kv... [4] https://opendev.org/openstack/whitebox-tempest-plugin/src/branch/master/.zuu...
[5] https://review.opendev.org/c/openstack/devstack/+/815020 [6] https://zuul.opendev.org/t/openstack/build/b5841d4d264c4c8f93d2368500d6221d
-- /kashyap
On Thu, Oct 21, 2021 at 10:56:42AM -0700, Clark Boylan wrote:
On Thu, Oct 21, 2021, at 10:49 AM, Kashyap Chamarthy wrote:
[...]
To fix this, please update the CPU model to "Nehalem". It is the oldest CPU model that is compatible with CentOS-9/RHEL-9 "x86-64-v2". Further, Nehalem also works with `virt_type=kvm|qemu`, _and_ on both Intel and AMD hardware. So this is a good alternative.
Thank you for looking into this and providing such detailed information. It has been really helpful.
No problem at all. I should've wrote this a bit sooner. [...]
Why is "qemu64" model undesirable for production? -------------------------------------------------
[...]
An understandable reason why CI systems running in a cloud environment go with 'qemu64' is convenience: with 'qemu64', you can live-migrate a guest regardless of its underlying hardware (whether it's Intel or AMD). That's one main reason why upstream DevStack defaults to it.
I've got a change up to Devstack to convert it over to Nehalem by default [5]. So far it looks good, but we will want to recheck it a few times and make sure we have good test coverage across the clouds we run testing on just to be sure that the CPUs we get from those clouds are able to support this CPU type. Good news is that we successfully built a centos-9-stream image and booted it with the Nehalem change in place [6].
I see that the DevStack default has now merged. Very cool. If anyone is wondering: "How come the 'Nehalem' QEMU CPU model works on both Intel and AMD hardware?". The answer is the CPU feature flags in Nehalem happened to supported by both Intel and AMD. Joy to us!
Overall, the thumb-rule here is to either always explicitly specify a "sane" CPU model, based on the recommendations here[3]. Or to use Nova/libvirt's default ("host-model").
Devstack is currently setting cpu_mode to none. Should Nova be updated to make this result in a better behavior? Is this literally not passing a cpu mode to libvirt/qemu and allowing them to choose a default? If so maybe libvirt/qemu need to update their defaults?
Yes, if you explicitly set `cpu_mode=none`, that does mean "use whatever is the default of QEMU". And no, we cannot update Nova to "result in better behaviour" for `cpu_mode=none` -- it essentially means changing the hypervisor-reported default from "qemu64" to something else in Nova. Changing the default in libvirt/QEMU is also very difficult -- "hysterical raisins" :-(. The reason is the following (thanks, Daniel Berrangé): Historically, QEMU never reported[1] what the default CPU model was. So libvirt assumed it was "qemu64". But unfortunately, until very recently[2][3] libvirt didn't expand this into the XML configs. So if upstream QEMU ever changes its default it would impact guests without an explicit XML config for CPU. * * * In any case, FWIW, Daniel also echoes what I noted in my previous email: in practise, both the upstream QEMU and libvirt defaults are "reasonably irrelevant" -- essentially any serious management tool will be setting an CPU model explicitly (Nova sets it to `host-model` for the KVM/QEMU driver). [1] [QEMU] https://gitlab.com/qemu-project/qemu/-/commit/04109957d4 -- qapi: report the default CPU type for each machine [2] https://bugzilla.redhat.com/show_bug.cgi?id=1598151 -- [RFE] Add 'qemu64' as the CPU model if user doesn't supply a <cpu/> element [3] [libvirt] https://gitlab.com/libvirt/libvirt/-/commit/5e939cea89 -- qemu: Store default CPU in domain XML -- /kashyap
participants (2)
-
Clark Boylan
-
Kashyap Chamarthy