[openstack-dev] [ironic] [nova] traits discussion call

Dmitry Tantsur divius.inside at gmail.com
Mon Oct 30 14:11:41 UTC 2017


Aaaand sorry again, but due to sudden errands I won't be able to attend.
Please feel free to use my bluejeans room anyway. I think my position on
traits is more or less clear from previous discussions with John, Sam and
Eric.

2017-10-24 18:07 GMT+02:00 Dmitry Tantsur <dtantsur at redhat.com>:

> Sigh, sorry. I forgot that we're moving back to winter time this weekend.
> I *think* the time is 3pm UTC then. It seems to be 11am eastern US:
> https://www.timeanddate.com/worldclock/converter.html?iso=20
> 171030T150000&p1=37&p2=tz_et.
>
>
> On 10/24/2017 06:00 PM, Dmitry Tantsur wrote:
>
>> And the winner is Mon, 30 Oct, 2pm UTC!
>>
>> The bluejeans ID is https://bluejeans.com/757528759
>> (works without plugins in recent FF and Chrome; if it asks to install an
>> app, ignore it and look for a link saying "join with browser")
>>
>> On 10/23/2017 05:02 PM, Dmitry Tantsur wrote:
>>
>>> Hi all!
>>>
>>> I'd like to invite you to the discussion of the way to implement traits
>>> in
>>> ironic and the ironic virt driver. Please vote for the time at
>>> https://doodle.com/poll/ts43k98kkvniv8uz. Please vote by EOD tomorrow.
>>>
>>> Note that it's going to be a technical discussion - please make sure you
>>> understand what traits are and why ironic cares about them. See below
>>> for more
>>> context.
>>>
>>> We'll probably use my bluejeans account, as it works without plugins in
>>> modern
>>> browsers. I'll post a meeting ID when we pick the date.
>>>
>>>
>>> On 10/23/2017 04:09 PM, Eric Fried wrote:
>>>
>>>> We discussed this a little bit further in IRC [1].  We're all in
>>>> agreement, but it's worth being precise on a couple of points:
>>>>
>>>> * We're distinguishing between a "feature" and the "trait" that
>>>> represents it in placement.  For the sake of this discussion, a
>>>> "feature" can (maybe) be switched on or off, but a "trait" can either be
>>>> present or absent on a RP.
>>>> * It matters *who* can turn a feature on/off.
>>>>     * If it can be done by virt at spawn time, then it makes sense to
>>>> have
>>>> the trait on the RP, and you can switch the feature on/off via a
>>>> separate extra_spec.
>>>>     * But if it's e.g. an admin action, and spawn has no control, then
>>>> the
>>>> trait needs to be *added* whenever the feature is *on*, and *removed*
>>>> whenever the feature is *off*.
>>>>
>>>> [1]
>>>> http://eavesdrop.openstack.org/irclogs/%23openstack-nova/%
>>>> 23openstack-nova.2017-10-23.log.html#t2017-10-23T13:12:13
>>>>
>>>> On 10/23/2017 08:15 AM, Sylvain Bauza wrote:
>>>>
>>>>>
>>>>>
>>>>> On Mon, Oct 23, 2017 at 2:54 PM, Eric Fried <openstack at fried.cc
>>>>> <mailto:openstack at fried.cc>> wrote:
>>>>>
>>>>>       I agree with Sean.  In general terms:
>>>>>
>>>>>       * A resource provider should be marked with a trait if that
>>>>> feature
>>>>>         * Can be turned on or off (whether it's currently on or not);
>>>>> or
>>>>>         * Is always on and can't ever be turned off.
>>>>>
>>>>>
>>>>> No, traits are not boolean. If a resource provider stops providing a
>>>>> capability, then the existing related trait should just be removed,
>>>>> that's it.
>>>>> If you see a trait, that's just means that the related capability for
>>>>> the Resource Provider is supported, that's it too.
>>>>>
>>>>> MHO.
>>>>>
>>>>> -Sylvain
>>>>>
>>>>>
>>>>>
>>>>>       * A consumer wanting that feature present (doesn't matter
>>>>> whether it's
>>>>>       on or off) should specify it as a required *trait*.
>>>>>       * A consumer wanting that feature present and turned on should
>>>>>         * Specify it as a required trait; AND
>>>>>         * Indicate that it be turned on via some other mechanism (e.g.
>>>>> a
>>>>>       separate extra_spec).
>>>>>
>>>>>       I believe this satisfies Dmitry's (Ironic's) needs, but also
>>>>> Jay's drive
>>>>>       for placement purity.
>>>>>
>>>>>       Please invite me to the hangout or whatever.
>>>>>
>>>>>       Thanks,
>>>>>       Eric
>>>>>
>>>>>       On 10/23/2017 07:22 AM, Mooney, Sean K wrote:
>>>>>       >
>>>>>       >
>>>>>       >
>>>>>       >
>>>>>       > *From:*Jay Pipes [mailto:jaypipes at gmail.com
>>>>>       <mailto:jaypipes at gmail.com>]
>>>>>       > *Sent:* Monday, October 23, 2017 12:20 PM
>>>>>       > *To:* OpenStack Development Mailing List
>>>>>       <openstack-dev at lists.openstack.org
>>>>>       <mailto:openstack-dev at lists.openstack.org>>
>>>>>       > *Subject:* Re: [openstack-dev] [ironic] ironic and traits
>>>>>       >
>>>>>       >
>>>>>       >
>>>>>       > Writing from my phone... May I ask that before you proceed
>>>>> with any plan
>>>>>       > that uses traits for state information that we have a hangout
>>>>> or
>>>>>       > videoconference to discuss this? Unfortunately today and
>>>>> tomorrow I'm
>>>>>       > not able to do a hangout but I can do one on Wednesday any
>>>>> time of the day.
>>>>>       >
>>>>>       >
>>>>>       >
>>>>>       > */[Mooney, Sean K] on the uefi boot topic I did bring up at the
>>>>>       ptg that
>>>>>       > we wanted to standardizes tratis for “verified boot” /*
>>>>>       >
>>>>>       > */that included a trait for uefi secure boot enabled and to
>>>>>       indicated a
>>>>>       > hardware root of trust, e.g. intel boot guard or similar/*
>>>>>       >
>>>>>       > */we distinctly wanted to be able to tag nova compute hosts
>>>>> with those
>>>>>       > new traits so we could require that vms that request/*
>>>>>       >
>>>>>       > */a host with uefi secure boot enabled and a hardware root of
>>>>>       trust are
>>>>>       > scheduled only to those nodes. /*
>>>>>       >
>>>>>       > */ /*
>>>>>       >
>>>>>       > */There are many other examples that effect both vms and bare
>>>>>       metal such
>>>>>       > as, ecc/interleaved memory, cluster on die, /*
>>>>>       >
>>>>>       > */l3 cache code and data prioritization, vt-d/vt-c, HPET, Hyper
>>>>>       > threading, power states … all of these feature may be present
>>>>> on the
>>>>>       > platform/*
>>>>>       >
>>>>>       > */but I also need to know if they are turned on. Ruling out
>>>>> state in
>>>>>       > traits means all of this logic will eventually get pushed to
>>>>> scheduler
>>>>>       > filters/*
>>>>>       >
>>>>>       > */which will be suboptimal long term as more state is tracked.
>>>>>       Software
>>>>>       > defined infrastructure may be the future but hardware defined
>>>>>       software/*
>>>>>       >
>>>>>       > */is sadly the present…/*
>>>>>       >
>>>>>       > */ /*
>>>>>       >
>>>>>       > */I do however think there should be a sperateion between
>>>>> asking for a
>>>>>       > host that provides x with a trait and  asking for x to be
>>>>>       configure via/*
>>>>>       >
>>>>>       > */A trait. The trait secure_boot_enabled should never result
>>>>> in the
>>>>>       > feature being enabled It should just find a host with it on. If
>>>>>       you want/*
>>>>>       >
>>>>>       > */To request it to be turned on you would request a host with
>>>>>       > secure_boot_capable as a trait and have a flavor extra spec or
>>>>> image
>>>>>       > property to request/*
>>>>>       >
>>>>>       > */Ironic to enabled it.  these are two very different request
>>>>> and
>>>>>       should
>>>>>       > not be treated the same. /*
>>>>>       >
>>>>>       >
>>>>>       >
>>>>>       >
>>>>>       >
>>>>>       > Lemme know!
>>>>>       >
>>>>>       > -jay
>>>>>       >
>>>>>       >
>>>>>       >
>>>>>       > On Oct 23, 2017 5:01 AM, "Dmitry Tantsur" <dtantsur at redhat.com
>>>>> <mailto:dtantsur at redhat.com>
>>>>>       > <mailto:dtantsur at redhat.com <mailto:dtantsur at redhat.com>>>
>>>>> wrote:
>>>>>       >
>>>>>       >     Hi Jay!
>>>>>       >
>>>>>       >     I appreciate your comments, but I think you're approaching
>>>>> the
>>>>>       >     problem from purely VM point of view. Things simply don't
>>>>> work the
>>>>>       >     same way in bare metal, at least not if we want to provide
>>>>> the same
>>>>>       >     user experience.
>>>>>       >
>>>>>       >
>>>>>       >
>>>>>       >     On Sun, Oct 22, 2017 at 2:25 PM, Jay Pipes <
>>>>> jaypipes at gmail.com <mailto:jaypipes at gmail.com>
>>>>>       >     <mailto:jaypipes at gmail.com <mailto:jaypipes at gmail.com>>>
>>>>> wrote:
>>>>>       >
>>>>>       >         Sorry for delay, took a week off before starting a new
>>>>> job.
>>>>>       >         Comments inline.
>>>>>       >
>>>>>       >         On 10/16/2017 12:24 PM, Dmitry Tantsur wrote:
>>>>>       >
>>>>>       >             Hi all,
>>>>>       >
>>>>>       >             I promised John to dump my thoughts on traits to
>>>>> the
>>>>>       ML, so
>>>>>       >             here we go :)
>>>>>       >
>>>>>       >             I see two roles of traits (or kinds of traits) for
>>>>>       bare metal:
>>>>>       >             1. traits that say what the node can do already
>>>>> (e.g. "the
>>>>>       >             node is
>>>>>       >             doing UEFI boot")
>>>>>       >             2. traits that say what the node can be
>>>>> *configured* to do
>>>>>       >             (e.g. "the node can
>>>>>       >             boot in UEFI mode")
>>>>>       >
>>>>>       >
>>>>>       >         There's only one role for traits. #2 above. #1 is state
>>>>>       >         information. Traits are not for state information.
>>>>> Traits are
>>>>>       >         only for communicating capabilities of a resource
>>>>> provider
>>>>>       >         (baremetal node).
>>>>>       >
>>>>>       >
>>>>>       >
>>>>>       >     These are not different, that's what I'm talking about
>>>>> here. No
>>>>>       >     users care about the difference between "this node was put
>>>>> in UEFI
>>>>>       >     mode by an operator in advance", "this node was put in UEFI
>>>>>       mode by
>>>>>       >     an ironic driver on demand" and "this node is always in
>>>>> UEFI mode,
>>>>>       >     because it's AARCH64 and it does not have BIOS". These
>>>>> situation
>>>>>       >     produce the same result (the node is booted in UEFI mode),
>>>>> and
>>>>>       thus
>>>>>       >     it's up to ironic to hide this difference.
>>>>>       >
>>>>>       >
>>>>>       >
>>>>>       >     My suggestion with traits is one way to do it, I'm not sure
>>>>>       what you
>>>>>       >     suggest though.
>>>>>       >
>>>>>       >
>>>>>       >
>>>>>       >
>>>>>       >         For example, let's say we add the following to the
>>>>> os-traits
>>>>>       >         library [1]
>>>>>       >
>>>>>       >         * STORAGE_RAID_0
>>>>>       >         * STORAGE_RAID_1
>>>>>       >         * STORAGE_RAID_5
>>>>>       >         * STORAGE_RAID_6
>>>>>       >         * STORAGE_RAID_10
>>>>>       >
>>>>>       >         The Ironic administrator would add all RAID-related
>>>>> traits to
>>>>>       >         the baremetal nodes that had the *capability* of
>>>>>       supporting that
>>>>>       >         particular RAID setup [2]
>>>>>       >
>>>>>       >         When provisioned, the baremetal node would either have
>>>>> RAID
>>>>>       >         configured in a certain level or not configured at all.
>>>>>       >
>>>>>       >
>>>>>       >         A very important note: the Placement API and Nova
>>>>>       scheduler (or
>>>>>       >         future Ironic scheduler) doesn't care about this. At
>>>>> all.
>>>>>       I know
>>>>>       >         it sounds like I'm being callous, but I'm not.
>>>>> Placement and
>>>>>       >         scheduling doesn't care about the state of things. It
>>>>> only
>>>>>       cares
>>>>>       >         about the capabilities of target destinations. That's
>>>>> it.
>>>>>       >
>>>>>       >
>>>>>       >
>>>>>       >     Yes, because VMs always start with a clean state, and
>>>>>       hypervisor is
>>>>>       >     there to ensure that. We don't have this luxury in ironic
>>>>> :) E.g.
>>>>>       >     our SNMP driver is not even aware of boot modes (or RAID,
>>>>> or BIOS
>>>>>       >     configuration), which does not mean that a node using it
>>>>> cannot be
>>>>>       >     in UEFI mode (have a RAID or BIOS pre-configured, etc,
>>>>> etc).
>>>>>       >
>>>>>       >
>>>>>       >
>>>>>       >
>>>>>       >
>>>>>       >             This seems confusing, but it's actually very
>>>>> useful.
>>>>>       Say, I
>>>>>       >             have a flavor that
>>>>>       >             requests UEFI boot via a trait. It will match both
>>>>> the
>>>>>       nodes
>>>>>       >             that are already in
>>>>>       >             UEFI mode, as well as nodes that can be put in
>>>>> UEFI mode.
>>>>>       >
>>>>>       >
>>>>>       >         No :) It will only match nodes that have the UEFI
>>>>> capability.
>>>>>       >         The set of providers that have the ability to be booted
>>>>>       via UEFI
>>>>>       >         is *always* a superset of the set of providers that
>>>>> *have been
>>>>>       >         booted via UEFI*. Placement and scheduling decisions
>>>>> only care
>>>>>       >         about that superset -- the providers with a particular
>>>>>       capability.
>>>>>       >
>>>>>       >
>>>>>       >
>>>>>       >     Well, no, it will. Again, you're purely basing on the VM
>>>>> idea,
>>>>>       where
>>>>>       >     a VM is always *put* in UEFI mode, no matter how the
>>>>> hypervisor
>>>>>       >     looks like. It is simply not the case for us. You have to
>>>>> care
>>>>>       what
>>>>>       >     state the node is, because many drivers cannot change this
>>>>> state.
>>>>>       >
>>>>>       >
>>>>>       >
>>>>>       >
>>>>>       >
>>>>>       >             This idea goes further with deploy templates (new
>>>>> concept
>>>>>       >             we've been thinking
>>>>>       >             about). A flavor can request something like
>>>>> CUSTOM_RAID_5,
>>>>>       >             and it will match the
>>>>>       >             nodes that already have RAID 5, or, more
>>>>>       interestingly, the
>>>>>       >             nodes on which we
>>>>>       >             can build RAID 5 before deployment. The UEFI
>>>>> example above
>>>>>       >             can be treated in a
>>>>>       >             similar way.
>>>>>       >
>>>>>       >             This ends up with two sources of knowledge about
>>>>> traits in
>>>>>       >             ironic:
>>>>>       >             1. Operators setting something they know about
>>>>> hardware
>>>>>       >             ("this node is in UEFI
>>>>>       >             mode"),
>>>>>       >             2. Ironic drivers reporting something they
>>>>>       >                2.1. know about hardware ("this node is in UEFI
>>>>> mode" -
>>>>>       >             again)
>>>>>       >                2.2. can do about hardware ("I can put this
>>>>> node in
>>>>>       UEFI
>>>>>       >             mode")
>>>>>       >
>>>>>       >
>>>>>       >         You're correct that both pieces of information are
>>>>> important.
>>>>>       >         However, only the "can do about hardware" part is
>>>>> relevant to
>>>>>       >         Placement and Nova.
>>>>>       >
>>>>>       >             For case #1 we are planning on a new CRUD API to
>>>>> set/unset
>>>>>       >             traits for a node.
>>>>>       >
>>>>>       >
>>>>>       >         I would *strongly* advise against this. Traits are not
>>>>> for
>>>>>       state
>>>>>       >         information.
>>>>>       >
>>>>>       >         Instead, consider having a DB (or JSON) schema that
>>>>> lists
>>>>>       state
>>>>>       >         information in fields that are explicitly for that
>>>>> state
>>>>>       >         information.
>>>>>       >
>>>>>       >         For example, a schema that looks like this:
>>>>>       >
>>>>>       >         {
>>>>>       >           "boot": {
>>>>>       >             "mode": <one of 'bios' or 'uefi'>,
>>>>>       >             "params": <dict>
>>>>>       >           },
>>>>>       >           "disk": {
>>>>>       >             "raid": {
>>>>>       >               "level": <int>,
>>>>>       >               "controller": <one of 'sw' or 'hw'>,
>>>>>       >               "driver": <string>,
>>>>>       >               "params": <dict>
>>>>>       >             },  ...
>>>>>       >           },
>>>>>       >           "network": {
>>>>>       >             ...
>>>>>       >           }
>>>>>       >         }
>>>>>       >
>>>>>       >         etc, etc.
>>>>>       >
>>>>>       >         Don't use trait strings to represent state information.
>>>>>       >
>>>>>       >
>>>>>       >
>>>>>       >     I don't see an alternative proposal that will satisfy what
>>>>> we have
>>>>>       >     to solve.
>>>>>       >
>>>>>       >
>>>>>       >
>>>>>       >
>>>>>       >         Best,
>>>>>       >         -jay
>>>>>       >
>>>>>       >             Case #2 is more interesting. We have two options,
>>>>> I think:
>>>>>       >
>>>>>       >             a) Operators still set traits on nodes, drivers
>>>>> are simply
>>>>>       >             validating them. E.g.
>>>>>       >             an operators sets CUSTOM_RAID_5, and the node's
>>>>> RAID
>>>>>       >             interface checks if it is
>>>>>       >             possible to do. The downside is obvious - with a
>>>>> lot of
>>>>>       >             deploy templates
>>>>>       >             available it can be a lot of manual work.
>>>>>       >
>>>>>       >             b) Drivers report the traits, and they get somehow
>>>>>       added to
>>>>>       >             the traits provided
>>>>>       >             by an operator. Technically, there are sub-cases
>>>>> again:
>>>>>       >                b.1) The new traits API returns a union of
>>>>>       >             operator-provided and
>>>>>       >             driver-provided traits
>>>>>       >                b.2) The new traits API returns only
>>>>> operator-provided
>>>>>       >             traits; driver-provided
>>>>>       >             traits are returned e.g. via a new field
>>>>>       >             (node.driver_traits). Then nova will
>>>>>       >             have to merge the lists itself.
>>>>>       >
>>>>>       >             My personal favorite is the last option: I'd like
>>>>> a clear
>>>>>       >             distinction between
>>>>>       >             different "sources" of traits, but I'd also like
>>>>> to reduce
>>>>>       >             manual work for
>>>>>       >             operators.
>>>>>       >
>>>>>       >             A valid counter-argument is: what if an operator
>>>>> wants to
>>>>>       >             override a
>>>>>       >             driver-provided trait? E.g. a node can do RAID 5,
>>>>> but I
>>>>>       >             don't want this
>>>>>       >             particular node to do it for any reason. I'm not
>>>>> sure if
>>>>>       >             it's a valid case, and
>>>>>       >             what to do about it.
>>>>>       >
>>>>>       >             Let me know what you think.
>>>>>       >
>>>>>       >             Dmitry
>>>>>       >
>>>>>       >
>>>>>       >         [1]
>>>>>       http://git.openstack.org/cgit/openstack/os-traits/tree/
>>>>>       <http://git.openstack.org/cgit/openstack/os-traits/tree/>
>>>>>       >         [2] Based on how many attached disks the node had, the
>>>>>       presence
>>>>>       >         and abilities of a hardware RAID controller, etc
>>>>>       >
>>>>>       >
>>>>>       >
>>>>>       >
>>>>>        ___________________________________________________________
>>>>> _______________
>>>>>       >         OpenStack Development Mailing List (not for usage
>>>>> questions)
>>>>>       >         Unsubscribe:
>>>>>       >
>>>>>        OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>>>>>       <http://OpenStack-dev-request@lists.openstack.org?subject:un
>>>>> subscribe>
>>>>>       >
>>>>>        <http://OpenStack-dev-request@lists.openstack.org?subject:un
>>>>> subscribe
>>>>>       <http://OpenStack-dev-request@lists.openstack.org?subject:un
>>>>> subscribe>>
>>>>>       >         http://lists.openstack.org/cg
>>>>> i-bin/mailman/listinfo/openstack-dev
>>>>>       <http://lists.openstack.org/cgi-bin/mailman/listinfo/opensta
>>>>> ck-dev>
>>>>>       >
>>>>>       >
>>>>>       >
>>>>>       >
>>>>>       >     _____________________________
>>>>> _____________________________________________
>>>>>       >     OpenStack Development Mailing List (not for usage
>>>>> questions)
>>>>>       >     Unsubscribe:
>>>>>       >     OpenStack-dev-request at lists.op
>>>>> enstack.org?subject:unsubscribe
>>>>>       <http://OpenStack-dev-request@lists.openstack.org?subject:un
>>>>> subscribe>
>>>>>       >
>>>>>        <http://OpenStack-dev-request@lists.openstack.org?subject:un
>>>>> subscribe
>>>>>       <http://OpenStack-dev-request@lists.openstack.org?subject:un
>>>>> subscribe>>
>>>>>       >
>>>>>        http://lists.openstack.org/cgi-bin/mailman/listinfo/openstac
>>>>> k-dev
>>>>>       <http://lists.openstack.org/cgi-bin/mailman/listinfo/opensta
>>>>> ck-dev>
>>>>>       >
>>>>>       >
>>>>>       >
>>>>>       >
>>>>>       ____________________________________________________________
>>>>> ______________
>>>>>       > OpenStack Development Mailing List (not for usage questions)
>>>>>       > Unsubscribe:
>>>>>       OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>>>>>       <http://OpenStack-dev-request@lists.openstack.org?subject:un
>>>>> subscribe>
>>>>>       > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstac
>>>>> k-dev
>>>>>       <http://lists.openstack.org/cgi-bin/mailman/listinfo/opensta
>>>>> ck-dev>
>>>>>       >
>>>>>
>>>>>       ____________________________________________________________
>>>>> ______________
>>>>>       OpenStack Development Mailing List (not for usage questions)
>>>>>       Unsubscribe:
>>>>>       OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>>>>>       <http://OpenStack-dev-request@lists.openstack.org?subject:un
>>>>> subscribe>
>>>>>       http://lists.openstack.org/cgi-bin/mailman/listinfo/openstac
>>>>> k-dev
>>>>>       <http://lists.openstack.org/cgi-bin/mailman/listinfo/opensta
>>>>> ck-dev>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> ____________________________________________________________
>>>>> ______________
>>>>> OpenStack Development Mailing List (not for usage questions)
>>>>> Unsubscribe: OpenStack-dev-request at lists.op
>>>>> enstack.org?subject:unsubscribe
>>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>>>
>>>>>
>>>> ____________________________________________________________
>>>> ______________
>>>> OpenStack Development Mailing List (not for usage questions)
>>>> Unsubscribe: OpenStack-dev-request at lists.op
>>>> enstack.org?subject:unsubscribe
>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>>
>>>>
>>
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>



-- 
--
-- Dmitry Tantsur
--
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20171030/00c62026/attachment.html>


More information about the OpenStack-dev mailing list