IMO this (cpu flags/features/attributes, and even possibly firmware patch levels, though probably not fan speed or temperature) is a perfectly suitable use of traits. Not all traits have to feed into Nova scheduling decisions; they could also be used by e.g. external orchestrators. os-traits needs to have that more global not-just-Nova perspective.
Clearly not everything has to feed into a Nova scheduling decision, by virtue of placement hoping to cater to things other than nova. That said, I do think that placement should try to avoid being "tags as a service" which this use-case is dangerously close to becoming, IMHO.
Okay, "stopping" / "refusing to launch" is too strict and unresonable; scratch that.
I agree with this, for all the reasons stated.
Me too, and that'd be a Nova decision to do anything with the security flag or not.
we can potentially make Nova check the 'sysfs' directory for vulnerabilities.
IMO this is still a good idea, but rather than warning / refusing to boot, we could expose a roll-up trait, subject to the strawman design below.
And I think it's a bad idea. Honestly, if we're going to do this, why not query yum/apt and set a trait for has-updates-pending? Or has-major-update-available? Or dell-tells-us-there-is-a-bios-update-for-this-machine? Where does it end? Obviously I think it's up to the placement team to decide if they're going to put has-updates-pending in the set of standard traits. I'd vote for no, and Jay will be turning over in his grave shortly. However, I strenuously object to Nova becoming the agent for everything on the compute node, software, hardware, etc. If we're going to peek into kernel updatey things, I don't see how we explain to the next person that it's not okay to check to see if firefox is up to date. Further, if we do get into this business, who is to say that in the future, Nova doesn't get a CVE for failing to notice and report something? Like, do we need to put nova in the embargo box since it claims to be able to tell you if your stuff is vulnerable or not?
To summarize my position on the os-traits side of things:
- We can merge the feature-ish traits (assuming folks can agree on which ones those are). - We can merge the vulnerability traits as long as they come with nice comments explaining the potential security pitfalls around using them. - Or for all I care we can merge nothing, since we don't actually seem to have a demand for it.
Every vendor has a tool dedicated to monitoring for updates, applicable vulnerabilities, and for orchestrating that work. A deployment of any appreciable size monitors hardware inventory and can answer the questions of which hosts need a patch without having to ask Nova about it. There are plenty of reasons why you might not apply one update at all or on a specifc schedule. This is well outside of Nova's scope.
The below would need a blueprint and a spec. And an owner. And it would be nice if it also had demand.
If we want to make scheduling decisions based on vulnerabilities, it needs to be under the exclusive control of the admin. As mentioned above, exposing the traits and allowing untrusted/untrustworthy users to target vulnerable hosts is only marginally worse than having those vulnerable hosts available to said untrusted users at all. So if we are going to have virt drivers expose a VULNERABLE trait in any form, it should come with:
Further, if placement is ever exposed to middle admins (i.e. domain admins, site admins in a larger deployment, etc) even read-only, presumably you'll need to be able to expose (or hide) the presence of a trait based on their security clearance.
1) a config option in the spirit of:
[scheduler] allow_scheduling_to_vulnerable_hosts = $bool (default: False)
which, when False, causes the scheduler to add trait:VULNERABLE=forbidden to *all* GET /a_c requests.
But we should generalize this to:
(a) Maintain a hardcoded list of traits that represent vulnerabilities or other undesirables (b) Have the conf option be [scheduler]evil_trait_whitelist (c) Add [trait:$X=forbidden for $X in {(b) - (a)}]
2) a hard check to disallow trait:$X=required from *anywhere* (flavor, image, etc.) regardless of the conf option. Either reject the boot request or explicitly strip that out.
For completeness, note that these traits need to be "negative" (i.e. "has vulnerability") so that we can forbid them in a list in the GET /a_c request. Because required=!INTEL_VULNERABLE,!AMD_VULNERABLE will correctly avoid vulnerable hosts from either vendor, but required=INTEL_FIXED,AMD_FIXED won't land anywhere, and we don't have required=in:INTEL_FIXED,AMD_FIXED yet.
I'm strong -3 on exposing VULNERABLE or NOT_VULNERABLE and +2 on SUPPORTS_SOMEACTUALCPUFLAG. It's trivial today for an operator to nova-disable all computes, and start enabling them as they are patched (automatically, with their patching tool). --Dan