[openstack-dev] [ironic] ironic and traits

Eric Fried openstack at fried.cc
Mon Oct 23 12:54:13 UTC 2017


I agree with Sean.  In general terms:

* A resource provider should be marked with a trait if that feature
  * Can be turned on or off (whether it's currently on or not); or
  * Is always on and can't ever be turned off.
* A consumer wanting that feature present (doesn't matter whether it's
on or off) should specify it as a required *trait*.
* A consumer wanting that feature present and turned on should
  * Specify it as a required trait; AND
  * Indicate that it be turned on via some other mechanism (e.g. a
separate extra_spec).

I believe this satisfies Dmitry's (Ironic's) needs, but also Jay's drive
for placement purity.

Please invite me to the hangout or whatever.

Thanks,
Eric

On 10/23/2017 07:22 AM, Mooney, Sean K wrote:
>  
> 
>  
> 
> *From:*Jay Pipes [mailto:jaypipes at gmail.com]
> *Sent:* Monday, October 23, 2017 12:20 PM
> *To:* OpenStack Development Mailing List <openstack-dev at lists.openstack.org>
> *Subject:* Re: [openstack-dev] [ironic] ironic and traits
> 
>  
> 
> Writing from my phone... May I ask that before you proceed with any plan
> that uses traits for state information that we have a hangout or
> videoconference to discuss this? Unfortunately today and tomorrow I'm
> not able to do a hangout but I can do one on Wednesday any time of the day.
> 
>  
> 
> */[Mooney, Sean K] on the uefi boot topic I did bring up at the ptg that
> we wanted to standardizes tratis for “verified boot” /*
> 
> */that included a trait for uefi secure boot enabled and to indicated a
> hardware root of trust, e.g. intel boot guard or similar/*
> 
> */we distinctly wanted to be able to tag nova compute hosts with those
> new traits so we could require that vms that request/*
> 
> */a host with uefi secure boot enabled and a hardware root of trust are
> scheduled only to those nodes. /*
> 
> */ /*
> 
> */There are many other examples that effect both vms and bare metal such
> as, ecc/interleaved memory, cluster on die, /*
> 
> */l3 cache code and data prioritization, vt-d/vt-c, HPET, Hyper
> threading, power states … all of these feature may be present on the
> platform/*
> 
> */but I also need to know if they are turned on. Ruling out state in
> traits means all of this logic will eventually get pushed to scheduler
> filters/*
> 
> */which will be suboptimal long term as more state is tracked. Software
> defined infrastructure may be the future but hardware defined software/*
> 
> */is sadly the present…/*
> 
> */ /*
> 
> */I do however think there should be a sperateion between asking for a
> host that provides x with a trait and  asking for x to be configure via/*
> 
> */A trait. The trait secure_boot_enabled should never result in the
> feature being enabled It should just find a host with it on. If you want/*
> 
> */To request it to be turned on you would request a host with
> secure_boot_capable as a trait and have a flavor extra spec or image
> property to request/*
> 
> */Ironic to enabled it.  these are two very different request and should
> not be treated the same. /*
> 
>  
> 
>  
> 
> Lemme know!
> 
> -jay
> 
>  
> 
> On Oct 23, 2017 5:01 AM, "Dmitry Tantsur" <dtantsur at redhat.com
> <mailto:dtantsur at redhat.com>> wrote:
> 
>     Hi Jay!
> 
>     I appreciate your comments, but I think you're approaching the
>     problem from purely VM point of view. Things simply don't work the
>     same way in bare metal, at least not if we want to provide the same
>     user experience.
> 
>      
> 
>     On Sun, Oct 22, 2017 at 2:25 PM, Jay Pipes <jaypipes at gmail.com
>     <mailto:jaypipes at gmail.com>> wrote:
> 
>         Sorry for delay, took a week off before starting a new job.
>         Comments inline.
> 
>         On 10/16/2017 12:24 PM, Dmitry Tantsur wrote:
> 
>             Hi all,
> 
>             I promised John to dump my thoughts on traits to the ML, so
>             here we go :)
> 
>             I see two roles of traits (or kinds of traits) for bare metal:
>             1. traits that say what the node can do already (e.g. "the
>             node is
>             doing UEFI boot")
>             2. traits that say what the node can be *configured* to do
>             (e.g. "the node can
>             boot in UEFI mode")
> 
> 
>         There's only one role for traits. #2 above. #1 is state
>         information. Traits are not for state information. Traits are
>         only for communicating capabilities of a resource provider
>         (baremetal node).
> 
>      
> 
>     These are not different, that's what I'm talking about here. No
>     users care about the difference between "this node was put in UEFI
>     mode by an operator in advance", "this node was put in UEFI mode by
>     an ironic driver on demand" and "this node is always in UEFI mode,
>     because it's AARCH64 and it does not have BIOS". These situation
>     produce the same result (the node is booted in UEFI mode), and thus
>     it's up to ironic to hide this difference.
> 
>      
> 
>     My suggestion with traits is one way to do it, I'm not sure what you
>     suggest though.
> 
>      
> 
> 
>         For example, let's say we add the following to the os-traits
>         library [1]
> 
>         * STORAGE_RAID_0
>         * STORAGE_RAID_1
>         * STORAGE_RAID_5
>         * STORAGE_RAID_6
>         * STORAGE_RAID_10
> 
>         The Ironic administrator would add all RAID-related traits to
>         the baremetal nodes that had the *capability* of supporting that
>         particular RAID setup [2]
> 
>         When provisioned, the baremetal node would either have RAID
>         configured in a certain level or not configured at all.
> 
> 
>         A very important note: the Placement API and Nova scheduler (or
>         future Ironic scheduler) doesn't care about this. At all. I know
>         it sounds like I'm being callous, but I'm not. Placement and
>         scheduling doesn't care about the state of things. It only cares
>         about the capabilities of target destinations. That's it.
> 
>      
> 
>     Yes, because VMs always start with a clean state, and hypervisor is
>     there to ensure that. We don't have this luxury in ironic :) E.g.
>     our SNMP driver is not even aware of boot modes (or RAID, or BIOS
>     configuration), which does not mean that a node using it cannot be
>     in UEFI mode (have a RAID or BIOS pre-configured, etc, etc).
> 
>      
> 
>          
> 
>             This seems confusing, but it's actually very useful. Say, I
>             have a flavor that
>             requests UEFI boot via a trait. It will match both the nodes
>             that are already in
>             UEFI mode, as well as nodes that can be put in UEFI mode.
> 
> 
>         No :) It will only match nodes that have the UEFI capability.
>         The set of providers that have the ability to be booted via UEFI
>         is *always* a superset of the set of providers that *have been
>         booted via UEFI*. Placement and scheduling decisions only care
>         about that superset -- the providers with a particular capability.
> 
>      
> 
>     Well, no, it will. Again, you're purely basing on the VM idea, where
>     a VM is always *put* in UEFI mode, no matter how the hypervisor
>     looks like. It is simply not the case for us. You have to care what
>     state the node is, because many drivers cannot change this state.
> 
>      
> 
>          
> 
>             This idea goes further with deploy templates (new concept
>             we've been thinking
>             about). A flavor can request something like CUSTOM_RAID_5,
>             and it will match the
>             nodes that already have RAID 5, or, more interestingly, the
>             nodes on which we
>             can build RAID 5 before deployment. The UEFI example above
>             can be treated in a
>             similar way.
> 
>             This ends up with two sources of knowledge about traits in
>             ironic:
>             1. Operators setting something they know about hardware
>             ("this node is in UEFI
>             mode"),
>             2. Ironic drivers reporting something they
>                2.1. know about hardware ("this node is in UEFI mode" -
>             again)
>                2.2. can do about hardware ("I can put this node in UEFI
>             mode")
> 
> 
>         You're correct that both pieces of information are important.
>         However, only the "can do about hardware" part is relevant to
>         Placement and Nova.
> 
>             For case #1 we are planning on a new CRUD API to set/unset
>             traits for a node.
> 
> 
>         I would *strongly* advise against this. Traits are not for state
>         information.
> 
>         Instead, consider having a DB (or JSON) schema that lists state
>         information in fields that are explicitly for that state
>         information.
> 
>         For example, a schema that looks like this:
> 
>         {
>           "boot": {
>             "mode": <one of 'bios' or 'uefi'>,
>             "params": <dict>
>           },
>           "disk": {
>             "raid": {
>               "level": <int>,
>               "controller": <one of 'sw' or 'hw'>,
>               "driver": <string>,
>               "params": <dict>
>             },  ...
>           },
>           "network": {
>             ...
>           }
>         }
> 
>         etc, etc.
> 
>         Don't use trait strings to represent state information.
> 
>      
> 
>     I don't see an alternative proposal that will satisfy what we have
>     to solve.
> 
>      
> 
> 
>         Best,
>         -jay
> 
>             Case #2 is more interesting. We have two options, I think:
> 
>             a) Operators still set traits on nodes, drivers are simply
>             validating them. E.g.
>             an operators sets CUSTOM_RAID_5, and the node's RAID
>             interface checks if it is
>             possible to do. The downside is obvious - with a lot of
>             deploy templates
>             available it can be a lot of manual work.
> 
>             b) Drivers report the traits, and they get somehow added to
>             the traits provided
>             by an operator. Technically, there are sub-cases again:
>                b.1) The new traits API returns a union of
>             operator-provided and
>             driver-provided traits
>                b.2) The new traits API returns only operator-provided
>             traits; driver-provided
>             traits are returned e.g. via a new field
>             (node.driver_traits). Then nova will
>             have to merge the lists itself.
> 
>             My personal favorite is the last option: I'd like a clear
>             distinction between
>             different "sources" of traits, but I'd also like to reduce
>             manual work for
>             operators.
> 
>             A valid counter-argument is: what if an operator wants to
>             override a
>             driver-provided trait? E.g. a node can do RAID 5, but I
>             don't want this
>             particular node to do it for any reason. I'm not sure if
>             it's a valid case, and
>             what to do about it.
> 
>             Let me know what you think.
> 
>             Dmitry
> 
> 
>         [1] http://git.openstack.org/cgit/openstack/os-traits/tree/
>         [2] Based on how many attached disks the node had, the presence
>         and abilities of a hardware RAID controller, etc
> 
> 
> 
>         __________________________________________________________________________
>         OpenStack Development Mailing List (not for usage questions)
>         Unsubscribe:
>         OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>         <http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe>
>         http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
>      
> 
> 
>     __________________________________________________________________________
>     OpenStack Development Mailing List (not for usage questions)
>     Unsubscribe:
>     OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>     <http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe>
>     http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> 
> 
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 



More information about the OpenStack-dev mailing list