Re: On reporting CPU flags that provide mitiation (to CVE flaws) as Nova 'traits'

15 May 2019

      On Wed, May 15, 2019 at 11:49:03AM +0100, Sean Mooney wrote:
...
On Wed, 2019-05-15 at 11:24 +0200, Kashyap Chamarthy wrote:
[...]
...
...
Contention / unsolved question
------------------------------
Whether we should expose CPU flags (e.g. "SSBD", or "STIBP") that
provide mitigation from CPU flaws as traits or not?  It is a "policy"
decision, and the 'traits' are "forever" (well, you can soft-deprecate
them with a comment) once they're added, hence all the belaboring.
There's no consensus here.  Some think that we should _not_ allow those
CPU flags as traits which can 'allow' you to target vulnerable hosts.
for what its worth im in this camp and have said so in other places
where we have been disucssing it.
Yep, noted.
...
...
Some think it is okay to add these as granular CPU traits.  (Have
a gander at the discussion on this[2] change.)
Does the Security Team has any strong opinions?
[...]
...
...
Next steps
----------
If there is consensus on dropping those CPU-flags-as-traits that let you
target vulnerable hosts, drop them.  And add only those CPU flags as
traits that provide either 'features' (what's the definition?) or those
that reduce performance degradation.
my vote is for only adding tratis for cpu featrue.
Noted; I'd like to hear other opinions.  (And note that the word
"feature" can get fuzzy in this context, I'll assume we're using it
somewhat loosely to include things that help with reducing perf
degradation, etc.)
...
PCID is a CPU feautre that was designed as a performce optiomistation
... except that "feature" was a 'no-op' and it wasn't even _used_, until
Linux 4.1.4 enabled it (in November 2017) for Meltdown mitigation.  So
the presence of PCID in the hardware didn't matter one whit all these
decades.  (Source: http://archive.is/ma8Iw.)
...
and several generation later also was found to be useful in reducing
the performace impacts of the sepcter mitigation
Nit: Not Spectre, but Meltdown.

[...]
...
...
Some think this is not "Nova's business", because: "just like how you
don't want to stop based on CPU fan speed or temperature or firmware
patch levels ...".
i think it applies perfectly.
It's a matter of scope.  To be clear — I'm not "insisting" that it be
done in Nova.  Just thinking out loud.

[...]
...
form a product perspective vendors shoudl ensure that they
provide tooling and software updated that are secure by default
"Product perspective" is irrelevant here.  Of course, it's obvious that
vendors "should" provide the relevant tooling and sofware updates.
...
...
But that argument doesn't quite apply, as CPU
fan/speed are very different, and are not seen by the guest.  If you
take security seriously, it _is_ be fair game, IMHO, to make Nova warn
(then stop) launching instances on Compute hosts with vulnerable
Correcting myself: Okay, "stopping" / "refusing to launch" is too strict
and unresonable; scratch that.  (Because, as discussed before, there
_are_ valid cases to be made that certain admins/operators intentionally
will run on vulnerable hypervisors — e.g. because their CPUs are too old
to receive microcode updates.  Or may deliberately tolerate this risk,
as they know their risk policy.  Or they're running staging envs, or any
number of other reasons.)
...
...
hypervisors.
the same aregument could be aplied to qemu or libvirt.
No, that argument does not apply to QEMU or libvirt.  Why?  QEMU and
libvirt are low-level primitives.  They explicitly state that they
don't, and will not, make such "policy" decisions.

But Nova, as a management tool, _does_ make some policy decisions (e.g.
how we generate a libvirt guest XML based on certain criteria, and
others).  And in this case, Nova _can_ take a stance that "orchestration
tools" should do that — that's perfectly acceptable.

[...]

-- 
/kashyap