Re: On reporting CPU flags that provide mitiation (to CVE flaws) as Nova 'traits'

16 May 2019

      On Wed, 15 May 2019, Eric Fried wrote:
...
(NB: I'm explicitly rendering "no opinion" on several items below so you
know I didn't miss/ignore them.)
I'm responding in this thread so that it's clear I'm not ignoring
it. I don't have a strong opinion. I agree that availability of a
trait in os-traits is not the same as nova reporting that trait when
creating resource providers representing compute nodes.

However, having something in os-traits that nobody is going to use
is not without cost: Once something is in os-traits it must stay
there forever.

So if there's no pressing use case for these additions, maybe we
just wait.

Bit more within...
...
However, I'll state again for the record that vendor-specific "positive"
traits (indicating "has mitigation", "not vulnerable", etc.) are nigh
worthless for the Nova scheduling use case of "land me on a
non-vulnerable host" because, until you can say
required=in:HW_CPU_X86_INTEL_FIX,HW_CPU_X86_AMD_FIX, you would have to
pick your CPU vendor ahead of time.
There's a spec for this, but it is currently on hold as there is
neither immediate use cases demanding to be satisfied, nor anyone to
do the work. https://review.opendev.org/649992
...
(Disclaimer: I'm a card-carrying "trait libertarian": freedom to do what
makes sense with traits, as long as you're not hurting anyone and it's
not costing the taxpayers.)
...
From a placement-the-service standpoint, it cares naught. It doesn't
know what traits mean and cannot distinguish between official and
custom traits when filtering candidates. It's important that
I guess that makes me a "trait anarcho communitarian". People should
have the freedom to do what they like with traits and they aren't
hurting anybody, but blessing a trait as official (by putting it in
os-traits) is a strong signifier and has system-wide impacts that
should be debated in ad-hoc committees endlessly until a consensus
emerges which avoids anyone facepalming or rage quitting.

placement be able to work easily with thousands or hundreds of
thousands of traits. We very definitely do not wanting to making
authorization decisions based on the value of a trait and the status
of the requestor.

As said elsewhere by several folk: It's how the other services use
them that matters.

I'm agnostic on nova reporting all the cpu
flags/features/capabilities as traits. If it is going to do that,
then having _those_ traits as members of os-traits is the right
thing to do.

I'm less agnostic on users ever needing or wanting to be aware of
specific cpu features in order to get a satisfactory workload
placement. I want to be able to request high performance without
knowing the required underlying features. Flavors + traits (which I
don't have to understand) gets us that, so ... cool.
...
If we want to make scheduling decisions based on vulnerabilities, it
needs to be under the exclusive control of the admin.
Others have said this (at least Dan): This seems like something
where something other than nova ought to handle it. A host which
shouldn't be scheduled to should be disabled (as a service).

-=-=-

This thread and several other conversations about traits and resource
classes have made it pretty clear that the knowledge and experience
required to make good decisions about what names should be in os-traits
and os-resource-classes (and the form the names should take) is not
exactly overlapping with what's required to be a core on the
placement service.

How do people feel about the idea of forming a core group for those
two repos that includes placement cores but has additions from nova
(Dan, Kashyap and Sean would make good candidates) and other projects
that consume them?

Having that group wouldn't remove the need for these extended
conversations but would help make sure the right people were aware
of changes and participating.

-- 
Chris Dent                       ٩◔̯◔۶           https://anticdent.org/
freenode: cdent