On reporting CPU flags that provide mitiation (to CVE flaws) as Nova 'traits'
[When replying, please keep us Cced, I'm not subscribed to the list.] Grab a cup of tea, slightly long e-mail. Some of us in the upstream Nova channel are splitting hairs over this seemingly small issue about potential "security concern" in representing some CPU flags as "traits"[0]. Context ------- The 'os-traits' project lets you report CPU flags as "traits", e.g. to configure a flavor to require the "AVX2" CPU flag: $> openstack flavor set 1 --property trait:HW_CPU_X86_AVX2=required And here's the list of current CPU traits: https://github.com/openstack/os-traits/blob/master/os_traits/hw/cpu/x86.py The other day I casually noticed that the above file is missing some important CPU flags, and proposed a couple of drive-by small changes. And that set off this massive bike-shedding exercise sized Jupier. Some of the following flags provide mitigation from Meltdown/Spectre, others reduce performance degradation, yet others are 'benign' features or are in weird zone like the just-off-the-press (for "ZombieLoad") CPU flag: 'md-clear' -- which doesn't fix the vulnerability, but is a way to report to the operating system that the kernel is opportunistically issuing instructions (which, thankfully, already _exist_ in the CPU) needed for the fix. (See what all these acronyms mean in the rendered QEMU documentation here[1]). - Intel : PCID, STIBP, SPEC-CTRL, SSBD, PDPE1GB, MD-CLEAR - AMD : IBPB, STIBP, VIRT-SSBD, AMD-SSBD, AMD-NO-SSB, PDPE1GB Two things to distinguish ------------------------- It's worth reminding that there are two things here: (a) allowing CPU flags via Nova's config attribute: `[libvirt]/cpu_model_extra_flags`; and (b) allow scheduling of instances based on CPU traits. We're talking about case (b) here. Possible "security issue" ------------------------- As mentioned above, Nova has this ability to say: "DON'T land this instance on a host if it has $TRAIT". So, hypothetically, you can say: "don't land on the host that has AMD-SSBD fix", which can be done (not yet, as the patch[2] is being discussed) in one of the two ways: - Via placement request: `required=!HW_CPU_X86_AMD_SSBD` - From a flavor Extra Spec: `trait:HW_CPU_X86_AMD_SSBD=forbidden` So, theoretically there is scope for "exploiting" (but non-trivial) the above — however, it can be possible only when Nova exposes them via Compute (which it doesn't yet). Contention / unsolved question ------------------------------ Whether we should expose CPU flags (e.g. "SSBD", or "STIBP") that provide mitigation from CPU flaws as traits or not? It is a "policy" decision, and the 'traits' are "forever" (well, you can soft-deprecate them with a comment) once they're added, hence all the belaboring. There's no consensus here. Some think that we should _not_ allow those CPU flags as traits which can 'allow' you to target vulnerable hosts. Some think it is okay to add these as granular CPU traits. (Have a gander at the discussion on this[2] change.) Does the Security Team has any strong opinions? On "generic" traits ------------------- We also discussed whether it makes sense to add "generic roll-up traits" such as 'HW_CPU_HAS_SPECTRE_CURE' and 'HW_CPU_HAS_MELTDOWN_CURE'. However, that seems intuitively appealinty, the messy real world doesn't quite allow that (as there are multiple different Spectre flaws) So, for now we won't do these "generic" triats; but it can be added later, if we change our minds. Next steps ---------- If there is consensus on dropping those CPU-flags-as-traits that let you target vulnerable hosts, drop them. And add only those CPU flags as traits that provide either 'features' (what's the definition?) or those that reduce performance degradation. Otherwise, add all the required CPU flags consistently to 'os-traits', and move on. Another idea ------------ For "Meltdown" (and for other vulnerabilities; it should be case-by-case, based on fix availability), we can potentially make Nova check the 'sysfs' directory for vulnerabilities. And if it reports "Vulnerable" (instead of "Mitigation", as shown below): $> cat /sys/devices/system/cpu/vulnerabilities/meltdown Mitigation: PTI Then we can print a log warning for the current release that the host is vulnerable and warn that future Nova will refuse to run VMs on it, and then next release, make it mandatory. Likewise for "Spectre": $> grep . /sys/devices/system/cpu/vulnerabilities/spectre_* /sys/devices/system/cpu/vulnerabilities/spectre_v1:Mitigation: __user pointer sanitization /sys/devices/system/cpu/vulnerabilities/spectre_v2:Mitigation: Full generic retpoline, IBPB: conditional, IBRS_FW, STIBP: conditional, RSB filling Some think this is not "Nova's business", because: "just like how you don't want to stop based on CPU fan speed or temperature or firmware patch levels ...". But that argument doesn't quite apply, as CPU fan/speed are very different, and are not seen by the guest. If you take security seriously, it _is_ be fair game, IMHO, to make Nova warn (then stop) launching instances on Compute hosts with vulnerable hypervisors. But my umbilical cord isn't tied to this idea, just wanted to mention it for completeness' sake. [0] https://github.com/openstack/os-traits/ [1] https://qemu.weilnetz.de/doc/qemu-doc.html#important_005fcpu_005ffeatures_00... [2] https://review.opendev.org/#/c/655193/ Add CPU traits for Meltdown/Spectre mitigatio -- /kashyap
On 2019-05-15 11:24:56 +0200 (+0200), Kashyap Chamarthy wrote:
[When replying, please keep us Cced, I'm not subscribed to the list.] [...]
At Kashyap's request I have bounced this message and his other followup to the openstack-discuss ML. Please reply there rather than on the openstack-security ML which is only used for automated notifications these days. -- Jeremy Stanley
participants (2)
-
Jeremy Stanley
-
Kashyap Chamarthy