[openstack-dev] [ironic] ironic and traits
jaypipes at gmail.com
Sun Oct 22 12:25:16 UTC 2017
Sorry for delay, took a week off before starting a new job. Comments inline.
On 10/16/2017 12:24 PM, Dmitry Tantsur wrote:
> Hi all,
> I promised John to dump my thoughts on traits to the ML, so here we go :)
> I see two roles of traits (or kinds of traits) for bare metal:
> 1. traits that say what the node can do already (e.g. "the node is
> doing UEFI boot")
> 2. traits that say what the node can be *configured* to do (e.g. "the node can
> boot in UEFI mode")
There's only one role for traits. #2 above. #1 is state information.
Traits are not for state information. Traits are only for communicating
capabilities of a resource provider (baremetal node).
For example, let's say we add the following to the os-traits library 
The Ironic administrator would add all RAID-related traits to the
baremetal nodes that had the *capability* of supporting that particular
RAID setup 
When provisioned, the baremetal node would either have RAID configured
in a certain level or not configured at all.
A very important note: the Placement API and Nova scheduler (or future
Ironic scheduler) doesn't care about this. At all. I know it sounds like
I'm being callous, but I'm not. Placement and scheduling doesn't care
about the state of things. It only cares about the capabilities of
target destinations. That's it.
> This seems confusing, but it's actually very useful. Say, I have a flavor that
> requests UEFI boot via a trait. It will match both the nodes that are already in
> UEFI mode, as well as nodes that can be put in UEFI mode.
No :) It will only match nodes that have the UEFI capability. The set of
providers that have the ability to be booted via UEFI is *always* a
superset of the set of providers that *have been booted via UEFI*.
Placement and scheduling decisions only care about that superset -- the
providers with a particular capability.
> This idea goes further with deploy templates (new concept we've been thinking
> about). A flavor can request something like CUSTOM_RAID_5, and it will match the
> nodes that already have RAID 5, or, more interestingly, the nodes on which we
> can build RAID 5 before deployment. The UEFI example above can be treated in a
> similar way.
> This ends up with two sources of knowledge about traits in ironic:
> 1. Operators setting something they know about hardware ("this node is in UEFI
> 2. Ironic drivers reporting something they
> 2.1. know about hardware ("this node is in UEFI mode" - again)
> 2.2. can do about hardware ("I can put this node in UEFI mode")
You're correct that both pieces of information are important. However,
only the "can do about hardware" part is relevant to Placement and Nova.
> For case #1 we are planning on a new CRUD API to set/unset traits for a node.
I would *strongly* advise against this. Traits are not for state
Instead, consider having a DB (or JSON) schema that lists state
information in fields that are explicitly for that state information.
For example, a schema that looks like this:
"mode": <one of 'bios' or 'uefi'>,
"controller": <one of 'sw' or 'hw'>,
Don't use trait strings to represent state information.
> Case #2 is more interesting. We have two options, I think:
> a) Operators still set traits on nodes, drivers are simply validating them. E.g.
> an operators sets CUSTOM_RAID_5, and the node's RAID interface checks if it is
> possible to do. The downside is obvious - with a lot of deploy templates
> available it can be a lot of manual work.
> b) Drivers report the traits, and they get somehow added to the traits provided
> by an operator. Technically, there are sub-cases again:
> b.1) The new traits API returns a union of operator-provided and
> driver-provided traits
> b.2) The new traits API returns only operator-provided traits; driver-provided
> traits are returned e.g. via a new field (node.driver_traits). Then nova will
> have to merge the lists itself.
> My personal favorite is the last option: I'd like a clear distinction between
> different "sources" of traits, but I'd also like to reduce manual work for
> A valid counter-argument is: what if an operator wants to override a
> driver-provided trait? E.g. a node can do RAID 5, but I don't want this
> particular node to do it for any reason. I'm not sure if it's a valid case, and
> what to do about it.
> Let me know what you think.
 Based on how many attached disks the node had, the presence and
abilities of a hardware RAID controller, etc
More information about the OpenStack-dev