Re: On reporting CPU flags that provide mitiation (to CVE flaws) as Nova 'traits'

15 May 2019

      ...
IMO this (cpu flags/features/attributes, and even possibly firmware
patch levels, though probably not fan speed or temperature) is a
perfectly suitable use of traits. Not all traits have to feed into Nova
scheduling decisions; they could also be used by e.g. external
orchestrators. os-traits needs to have that more global not-just-Nova
perspective.
Clearly not everything has to feed into a Nova scheduling decision, by
virtue of placement hoping to cater to things other than nova. That
said, I do think that placement should try to avoid being "tags as a
service" which this use-case is dangerously close to becoming, IMHO.
...
...
Okay, "stopping" / "refusing to launch" is too strict
and unresonable; scratch that.
I agree with this, for all the reasons stated.
Me too, and that'd be a Nova decision to do anything with the security
flag or not.
...
...
we can potentially make Nova
check the 'sysfs' directory for vulnerabilities.
IMO this is still a good idea, but rather than warning / refusing to
boot, we could expose a roll-up trait, subject to the strawman design
below.
And I think it's a bad idea. Honestly, if we're going to do this, why
not query yum/apt and set a trait for has-updates-pending? Or
has-major-update-available? Or
dell-tells-us-there-is-a-bios-update-for-this-machine? Where does it
end?

Obviously I think it's up to the placement team to decide if they're
going to put has-updates-pending in the set of standard traits. I'd vote
for no, and Jay will be turning over in his grave shortly. However, I
strenuously object to Nova becoming the agent for everything on the
compute node, software, hardware, etc. If we're going to peek into
kernel updatey things, I don't see how we explain to the next person
that it's not okay to check to see if firefox is up to date.

Further, if we do get into this business, who is to say that in the
future, Nova doesn't get a CVE for failing to notice and report
something? Like, do we need to put nova in the embargo box since it
claims to be able to tell you if your stuff is vulnerable or not?
...
To summarize my position on the os-traits side of things:
- We can merge the feature-ish traits (assuming folks can agree on which
ones those are).
- We can merge the vulnerability traits as long as they come with nice
comments explaining the potential security pitfalls around using them.
- Or for all I care we can merge nothing, since we don't actually seem
to have a demand for it.
Every vendor has a tool dedicated to monitoring for updates, applicable
vulnerabilities, and for orchestrating that work. A deployment of any
appreciable size monitors hardware inventory and can answer the
questions of which hosts need a patch without having to ask Nova about
it. There are plenty of reasons why you might not apply one update at
all or on a specifc schedule. This is well outside of Nova's scope.
...
The below would need a blueprint and a spec. And an owner. And it would
be nice if it also had demand.
If we want to make scheduling decisions based on vulnerabilities, it
needs to be under the exclusive control of the admin. As mentioned
above, exposing the traits and allowing untrusted/untrustworthy users to
target vulnerable hosts is only marginally worse than having those
vulnerable hosts available to said untrusted users at all. So if we are
going to have virt drivers expose a VULNERABLE trait in any form, it
should come with:
Further, if placement is ever exposed to middle admins (i.e. domain
admins, site admins in a larger deployment, etc) even read-only,
presumably you'll need to be able to expose (or hide) the presence of a
trait based on their security clearance.
...
1) a config option in the spirit of:
[scheduler]
allow_scheduling_to_vulnerable_hosts = $bool (default: False)
which, when False, causes the scheduler to add
trait:VULNERABLE=forbidden to *all* GET /a_c requests.
But we should generalize this to:
(a) Maintain a hardcoded list of traits that represent vulnerabilities
or other undesirables
  (b) Have the conf option be [scheduler]evil_trait_whitelist
  (c) Add [trait:$X=forbidden for $X in {(b) - (a)}]
2) a hard check to disallow trait:$X=required from *anywhere* (flavor,
image, etc.) regardless of the conf option. Either reject the boot
request or explicitly strip that out.
For completeness, note that these traits need to be "negative" (i.e.
"has vulnerability") so that we can forbid them in a list in the GET
/a_c request. Because required=!INTEL_VULNERABLE,!AMD_VULNERABLE will
correctly avoid vulnerable hosts from either vendor, but
required=INTEL_FIXED,AMD_FIXED won't land anywhere, and we don't have
required=in:INTEL_FIXED,AMD_FIXED yet.
I'm strong -3 on exposing VULNERABLE or NOT_VULNERABLE and +2 on
SUPPORTS_SOMEACTUALCPUFLAG. It's trivial today for an operator to nova-disable
all computes, and start enabling them as they are patched
(automatically, with their patching tool).

--Dan

Re: On reporting CPU flags that provide mitiation (to CVE flaws) as Nova 'traits'

Dan Smith