[openstack-dev] [ironic] [nova] traits discussion call

Dmitry Tantsur dtantsur at redhat.com
Tue Oct 24 16:07:56 UTC 2017


Sigh, sorry. I forgot that we're moving back to winter time this weekend. I 
*think* the time is 3pm UTC then. It seems to be 11am eastern US: 
https://www.timeanddate.com/worldclock/converter.html?iso=20171030T150000&p1=37&p2=tz_et.

On 10/24/2017 06:00 PM, Dmitry Tantsur wrote:
> And the winner is Mon, 30 Oct, 2pm UTC!
> 
> The bluejeans ID is https://bluejeans.com/757528759
> (works without plugins in recent FF and Chrome; if it asks to install an app, 
> ignore it and look for a link saying "join with browser")
> 
> On 10/23/2017 05:02 PM, Dmitry Tantsur wrote:
>> Hi all!
>>
>> I'd like to invite you to the discussion of the way to implement traits in
>> ironic and the ironic virt driver. Please vote for the time at
>> https://doodle.com/poll/ts43k98kkvniv8uz. Please vote by EOD tomorrow.
>>
>> Note that it's going to be a technical discussion - please make sure you
>> understand what traits are and why ironic cares about them. See below for more
>> context.
>>
>> We'll probably use my bluejeans account, as it works without plugins in modern
>> browsers. I'll post a meeting ID when we pick the date.
>>
>>
>> On 10/23/2017 04:09 PM, Eric Fried wrote:
>>> We discussed this a little bit further in IRC [1].  We're all in
>>> agreement, but it's worth being precise on a couple of points:
>>>
>>> * We're distinguishing between a "feature" and the "trait" that
>>> represents it in placement.  For the sake of this discussion, a
>>> "feature" can (maybe) be switched on or off, but a "trait" can either be
>>> present or absent on a RP.
>>> * It matters *who* can turn a feature on/off.
>>>     * If it can be done by virt at spawn time, then it makes sense to have
>>> the trait on the RP, and you can switch the feature on/off via a
>>> separate extra_spec.
>>>     * But if it's e.g. an admin action, and spawn has no control, then the
>>> trait needs to be *added* whenever the feature is *on*, and *removed*
>>> whenever the feature is *off*.
>>>
>>> [1]
>>> http://eavesdrop.openstack.org/irclogs/%23openstack-nova/%23openstack-nova.2017-10-23.log.html#t2017-10-23T13:12:13 
>>>
>>>
>>> On 10/23/2017 08:15 AM, Sylvain Bauza wrote:
>>>>
>>>>
>>>> On Mon, Oct 23, 2017 at 2:54 PM, Eric Fried <openstack at fried.cc
>>>> <mailto:openstack at fried.cc>> wrote:
>>>>
>>>>       I agree with Sean.  In general terms:
>>>>
>>>>       * A resource provider should be marked with a trait if that feature
>>>>         * Can be turned on or off (whether it's currently on or not); or
>>>>         * Is always on and can't ever be turned off.
>>>>
>>>>
>>>> No, traits are not boolean. If a resource provider stops providing a
>>>> capability, then the existing related trait should just be removed,
>>>> that's it.
>>>> If you see a trait, that's just means that the related capability for
>>>> the Resource Provider is supported, that's it too.
>>>>
>>>> MHO.
>>>>
>>>> -Sylvain
>>>>
>>>>
>>>>
>>>>       * A consumer wanting that feature present (doesn't matter whether it's
>>>>       on or off) should specify it as a required *trait*.
>>>>       * A consumer wanting that feature present and turned on should
>>>>         * Specify it as a required trait; AND
>>>>         * Indicate that it be turned on via some other mechanism (e.g. a
>>>>       separate extra_spec).
>>>>
>>>>       I believe this satisfies Dmitry's (Ironic's) needs, but also Jay's drive
>>>>       for placement purity.
>>>>
>>>>       Please invite me to the hangout or whatever.
>>>>
>>>>       Thanks,
>>>>       Eric
>>>>
>>>>       On 10/23/2017 07:22 AM, Mooney, Sean K wrote:
>>>>       >
>>>>       >
>>>>       >
>>>>       >
>>>>       > *From:*Jay Pipes [mailto:jaypipes at gmail.com
>>>>       <mailto:jaypipes at gmail.com>]
>>>>       > *Sent:* Monday, October 23, 2017 12:20 PM
>>>>       > *To:* OpenStack Development Mailing List
>>>>       <openstack-dev at lists.openstack.org
>>>>       <mailto:openstack-dev at lists.openstack.org>>
>>>>       > *Subject:* Re: [openstack-dev] [ironic] ironic and traits
>>>>       >
>>>>       >
>>>>       >
>>>>       > Writing from my phone... May I ask that before you proceed with any 
>>>> plan
>>>>       > that uses traits for state information that we have a hangout or
>>>>       > videoconference to discuss this? Unfortunately today and tomorrow I'm
>>>>       > not able to do a hangout but I can do one on Wednesday any time of 
>>>> the day.
>>>>       >
>>>>       >
>>>>       >
>>>>       > */[Mooney, Sean K] on the uefi boot topic I did bring up at the
>>>>       ptg that
>>>>       > we wanted to standardizes tratis for “verified boot” /*
>>>>       >
>>>>       > */that included a trait for uefi secure boot enabled and to
>>>>       indicated a
>>>>       > hardware root of trust, e.g. intel boot guard or similar/*
>>>>       >
>>>>       > */we distinctly wanted to be able to tag nova compute hosts with those
>>>>       > new traits so we could require that vms that request/*
>>>>       >
>>>>       > */a host with uefi secure boot enabled and a hardware root of
>>>>       trust are
>>>>       > scheduled only to those nodes. /*
>>>>       >
>>>>       > */ /*
>>>>       >
>>>>       > */There are many other examples that effect both vms and bare
>>>>       metal such
>>>>       > as, ecc/interleaved memory, cluster on die, /*
>>>>       >
>>>>       > */l3 cache code and data prioritization, vt-d/vt-c, HPET, Hyper
>>>>       > threading, power states … all of these feature may be present on the
>>>>       > platform/*
>>>>       >
>>>>       > */but I also need to know if they are turned on. Ruling out state in
>>>>       > traits means all of this logic will eventually get pushed to scheduler
>>>>       > filters/*
>>>>       >
>>>>       > */which will be suboptimal long term as more state is tracked.
>>>>       Software
>>>>       > defined infrastructure may be the future but hardware defined
>>>>       software/*
>>>>       >
>>>>       > */is sadly the present…/*
>>>>       >
>>>>       > */ /*
>>>>       >
>>>>       > */I do however think there should be a sperateion between asking for a
>>>>       > host that provides x with a trait and  asking for x to be
>>>>       configure via/*
>>>>       >
>>>>       > */A trait. The trait secure_boot_enabled should never result in the
>>>>       > feature being enabled It should just find a host with it on. If
>>>>       you want/*
>>>>       >
>>>>       > */To request it to be turned on you would request a host with
>>>>       > secure_boot_capable as a trait and have a flavor extra spec or image
>>>>       > property to request/*
>>>>       >
>>>>       > */Ironic to enabled it.  these are two very different request and
>>>>       should
>>>>       > not be treated the same. /*
>>>>       >
>>>>       >
>>>>       >
>>>>       >
>>>>       >
>>>>       > Lemme know!
>>>>       >
>>>>       > -jay
>>>>       >
>>>>       >
>>>>       >
>>>>       > On Oct 23, 2017 5:01 AM, "Dmitry Tantsur" <dtantsur at redhat.com 
>>>> <mailto:dtantsur at redhat.com>
>>>>       > <mailto:dtantsur at redhat.com <mailto:dtantsur at redhat.com>>> wrote:
>>>>       >
>>>>       >     Hi Jay!
>>>>       >
>>>>       >     I appreciate your comments, but I think you're approaching the
>>>>       >     problem from purely VM point of view. Things simply don't work the
>>>>       >     same way in bare metal, at least not if we want to provide the same
>>>>       >     user experience.
>>>>       >
>>>>       >
>>>>       >
>>>>       >     On Sun, Oct 22, 2017 at 2:25 PM, Jay Pipes <jaypipes at gmail.com 
>>>> <mailto:jaypipes at gmail.com>
>>>>       >     <mailto:jaypipes at gmail.com <mailto:jaypipes at gmail.com>>> wrote:
>>>>       >
>>>>       >         Sorry for delay, took a week off before starting a new job.
>>>>       >         Comments inline.
>>>>       >
>>>>       >         On 10/16/2017 12:24 PM, Dmitry Tantsur wrote:
>>>>       >
>>>>       >             Hi all,
>>>>       >
>>>>       >             I promised John to dump my thoughts on traits to the
>>>>       ML, so
>>>>       >             here we go :)
>>>>       >
>>>>       >             I see two roles of traits (or kinds of traits) for
>>>>       bare metal:
>>>>       >             1. traits that say what the node can do already (e.g. "the
>>>>       >             node is
>>>>       >             doing UEFI boot")
>>>>       >             2. traits that say what the node can be *configured* to do
>>>>       >             (e.g. "the node can
>>>>       >             boot in UEFI mode")
>>>>       >
>>>>       >
>>>>       >         There's only one role for traits. #2 above. #1 is state
>>>>       >         information. Traits are not for state information. Traits are
>>>>       >         only for communicating capabilities of a resource provider
>>>>       >         (baremetal node).
>>>>       >
>>>>       >
>>>>       >
>>>>       >     These are not different, that's what I'm talking about here. No
>>>>       >     users care about the difference between "this node was put in UEFI
>>>>       >     mode by an operator in advance", "this node was put in UEFI
>>>>       mode by
>>>>       >     an ironic driver on demand" and "this node is always in UEFI mode,
>>>>       >     because it's AARCH64 and it does not have BIOS". These situation
>>>>       >     produce the same result (the node is booted in UEFI mode), and
>>>>       thus
>>>>       >     it's up to ironic to hide this difference.
>>>>       >
>>>>       >
>>>>       >
>>>>       >     My suggestion with traits is one way to do it, I'm not sure
>>>>       what you
>>>>       >     suggest though.
>>>>       >
>>>>       >
>>>>       >
>>>>       >
>>>>       >         For example, let's say we add the following to the os-traits
>>>>       >         library [1]
>>>>       >
>>>>       >         * STORAGE_RAID_0
>>>>       >         * STORAGE_RAID_1
>>>>       >         * STORAGE_RAID_5
>>>>       >         * STORAGE_RAID_6
>>>>       >         * STORAGE_RAID_10
>>>>       >
>>>>       >         The Ironic administrator would add all RAID-related traits to
>>>>       >         the baremetal nodes that had the *capability* of
>>>>       supporting that
>>>>       >         particular RAID setup [2]
>>>>       >
>>>>       >         When provisioned, the baremetal node would either have RAID
>>>>       >         configured in a certain level or not configured at all.
>>>>       >
>>>>       >
>>>>       >         A very important note: the Placement API and Nova
>>>>       scheduler (or
>>>>       >         future Ironic scheduler) doesn't care about this. At all.
>>>>       I know
>>>>       >         it sounds like I'm being callous, but I'm not. Placement and
>>>>       >         scheduling doesn't care about the state of things. It only
>>>>       cares
>>>>       >         about the capabilities of target destinations. That's it.
>>>>       >
>>>>       >
>>>>       >
>>>>       >     Yes, because VMs always start with a clean state, and
>>>>       hypervisor is
>>>>       >     there to ensure that. We don't have this luxury in ironic :) E.g.
>>>>       >     our SNMP driver is not even aware of boot modes (or RAID, or BIOS
>>>>       >     configuration), which does not mean that a node using it cannot be
>>>>       >     in UEFI mode (have a RAID or BIOS pre-configured, etc, etc).
>>>>       >
>>>>       >
>>>>       >
>>>>       >
>>>>       >
>>>>       >             This seems confusing, but it's actually very useful.
>>>>       Say, I
>>>>       >             have a flavor that
>>>>       >             requests UEFI boot via a trait. It will match both the
>>>>       nodes
>>>>       >             that are already in
>>>>       >             UEFI mode, as well as nodes that can be put in UEFI mode.
>>>>       >
>>>>       >
>>>>       >         No :) It will only match nodes that have the UEFI capability.
>>>>       >         The set of providers that have the ability to be booted
>>>>       via UEFI
>>>>       >         is *always* a superset of the set of providers that *have been
>>>>       >         booted via UEFI*. Placement and scheduling decisions only care
>>>>       >         about that superset -- the providers with a particular
>>>>       capability.
>>>>       >
>>>>       >
>>>>       >
>>>>       >     Well, no, it will. Again, you're purely basing on the VM idea,
>>>>       where
>>>>       >     a VM is always *put* in UEFI mode, no matter how the hypervisor
>>>>       >     looks like. It is simply not the case for us. You have to care
>>>>       what
>>>>       >     state the node is, because many drivers cannot change this state.
>>>>       >
>>>>       >
>>>>       >
>>>>       >
>>>>       >
>>>>       >             This idea goes further with deploy templates (new concept
>>>>       >             we've been thinking
>>>>       >             about). A flavor can request something like CUSTOM_RAID_5,
>>>>       >             and it will match the
>>>>       >             nodes that already have RAID 5, or, more
>>>>       interestingly, the
>>>>       >             nodes on which we
>>>>       >             can build RAID 5 before deployment. The UEFI example above
>>>>       >             can be treated in a
>>>>       >             similar way.
>>>>       >
>>>>       >             This ends up with two sources of knowledge about traits in
>>>>       >             ironic:
>>>>       >             1. Operators setting something they know about hardware
>>>>       >             ("this node is in UEFI
>>>>       >             mode"),
>>>>       >             2. Ironic drivers reporting something they
>>>>       >                2.1. know about hardware ("this node is in UEFI mode" -
>>>>       >             again)
>>>>       >                2.2. can do about hardware ("I can put this node in
>>>>       UEFI
>>>>       >             mode")
>>>>       >
>>>>       >
>>>>       >         You're correct that both pieces of information are important.
>>>>       >         However, only the "can do about hardware" part is relevant to
>>>>       >         Placement and Nova.
>>>>       >
>>>>       >             For case #1 we are planning on a new CRUD API to set/unset
>>>>       >             traits for a node.
>>>>       >
>>>>       >
>>>>       >         I would *strongly* advise against this. Traits are not for
>>>>       state
>>>>       >         information.
>>>>       >
>>>>       >         Instead, consider having a DB (or JSON) schema that lists
>>>>       state
>>>>       >         information in fields that are explicitly for that state
>>>>       >         information.
>>>>       >
>>>>       >         For example, a schema that looks like this:
>>>>       >
>>>>       >         {
>>>>       >           "boot": {
>>>>       >             "mode": <one of 'bios' or 'uefi'>,
>>>>       >             "params": <dict>
>>>>       >           },
>>>>       >           "disk": {
>>>>       >             "raid": {
>>>>       >               "level": <int>,
>>>>       >               "controller": <one of 'sw' or 'hw'>,
>>>>       >               "driver": <string>,
>>>>       >               "params": <dict>
>>>>       >             },  ...
>>>>       >           },
>>>>       >           "network": {
>>>>       >             ...
>>>>       >           }
>>>>       >         }
>>>>       >
>>>>       >         etc, etc.
>>>>       >
>>>>       >         Don't use trait strings to represent state information.
>>>>       >
>>>>       >
>>>>       >
>>>>       >     I don't see an alternative proposal that will satisfy what we have
>>>>       >     to solve.
>>>>       >
>>>>       >
>>>>       >
>>>>       >
>>>>       >         Best,
>>>>       >         -jay
>>>>       >
>>>>       >             Case #2 is more interesting. We have two options, I think:
>>>>       >
>>>>       >             a) Operators still set traits on nodes, drivers are simply
>>>>       >             validating them. E.g.
>>>>       >             an operators sets CUSTOM_RAID_5, and the node's RAID
>>>>       >             interface checks if it is
>>>>       >             possible to do. The downside is obvious - with a lot of
>>>>       >             deploy templates
>>>>       >             available it can be a lot of manual work.
>>>>       >
>>>>       >             b) Drivers report the traits, and they get somehow
>>>>       added to
>>>>       >             the traits provided
>>>>       >             by an operator. Technically, there are sub-cases again:
>>>>       >                b.1) The new traits API returns a union of
>>>>       >             operator-provided and
>>>>       >             driver-provided traits
>>>>       >                b.2) The new traits API returns only operator-provided
>>>>       >             traits; driver-provided
>>>>       >             traits are returned e.g. via a new field
>>>>       >             (node.driver_traits). Then nova will
>>>>       >             have to merge the lists itself.
>>>>       >
>>>>       >             My personal favorite is the last option: I'd like a clear
>>>>       >             distinction between
>>>>       >             different "sources" of traits, but I'd also like to reduce
>>>>       >             manual work for
>>>>       >             operators.
>>>>       >
>>>>       >             A valid counter-argument is: what if an operator wants to
>>>>       >             override a
>>>>       >             driver-provided trait? E.g. a node can do RAID 5, but I
>>>>       >             don't want this
>>>>       >             particular node to do it for any reason. I'm not sure if
>>>>       >             it's a valid case, and
>>>>       >             what to do about it.
>>>>       >
>>>>       >             Let me know what you think.
>>>>       >
>>>>       >             Dmitry
>>>>       >
>>>>       >
>>>>       >         [1]
>>>>       http://git.openstack.org/cgit/openstack/os-traits/tree/
>>>>       <http://git.openstack.org/cgit/openstack/os-traits/tree/>
>>>>       >         [2] Based on how many attached disks the node had, the
>>>>       presence
>>>>       >         and abilities of a hardware RAID controller, etc
>>>>       >
>>>>       >
>>>>       >
>>>>       >
>>>>        
>>>> __________________________________________________________________________
>>>>       >         OpenStack Development Mailing List (not for usage questions)
>>>>       >         Unsubscribe:
>>>>       >
>>>>        OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>>>>       <http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe>
>>>>       >
>>>>        <http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
>>>>       <http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe>>
>>>>       >         
>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>>       <http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev>
>>>>       >
>>>>       >
>>>>       >
>>>>       >
>>>>       >     
>>>> __________________________________________________________________________
>>>>       >     OpenStack Development Mailing List (not for usage questions)
>>>>       >     Unsubscribe:
>>>>       >     OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>>>>       <http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe>
>>>>       >
>>>>        <http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe
>>>>       <http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe>>
>>>>       >
>>>>        http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>>       <http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev>
>>>>       >
>>>>       >
>>>>       >
>>>>       >
>>>>       
>>>> __________________________________________________________________________
>>>>       > OpenStack Development Mailing List (not for usage questions)
>>>>       > Unsubscribe:
>>>>       OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>>>>       <http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe>
>>>>       > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>>       <http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev>
>>>>       >
>>>>
>>>>       
>>>> __________________________________________________________________________
>>>>       OpenStack Development Mailing List (not for usage questions)
>>>>       Unsubscribe:
>>>>       OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>>>>       <http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe>
>>>>       http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>>       <http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev>
>>>>
>>>>
>>>>
>>>>
>>>> __________________________________________________________________________
>>>> OpenStack Development Mailing List (not for usage questions)
>>>> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>>
>>>
>>> __________________________________________________________________________
>>> OpenStack Development Mailing List (not for usage questions)
>>> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>
> 




More information about the OpenStack-dev mailing list