[openstack-dev] [ironic] Pike PTG report

Loo, Ruby ruby.loo at intel.com
Thu Mar 9 14:51:25 UTC 2017

Thanks Dmitry for the great report! If you want your comment(s) to be available to the ironic community, please respond here, not in Dmitry's blog. (Or, to rephrase it a different way, if you have comments that you don't want me to know about, please respond in Dmitry's blog :))

Many thanks to Michael Turek for taking great notes during the PTG. Speaking of which, we did have some informal discussions Friday morning, but Michael wasn't there :)


On 2017-03-08, 10:06 AM, "Dmitry Tantsur" <dtantsur at redhat.com> wrote:

    Hi all!
    I've finished my Pike PTG report. It is spread over four blog posts:
    It was a lot of typing, please pardon mistakes. The whole text (in RST format) 
    for archiving purposes is copy-pasted in the end of this message.
    Please feel free to respond here or in the blog comments.
    Ongoing work and status updates
    Etherpad: https://etherpad.openstack.org/p/ironic-pike-ptg-ongoing-work.
    We spent the first half of Wednesday discussing this. There was a lot of
    incomplete work left from Ocata, and some major ongoing work that we did not
    even plan to finish in Ocata.
    Got some progress, most of the Ironic patches are up. Desperately needs review
    and testing, though. The Nova part is also lagging behind, and should be
    brought to the Nova team attention.
         **mgoddard** and **dtantsur** volunteered to help with testing, while
         **mjturek**, **hsiina** and **crushil** volunteered to do some coding.
    **Goals for Pike**
         finish the first (iSCSI using iPXE) case and the Nova part.
    A lot of progress here during Ocata, completed bonding and attach/detach API.
    VLAN-aware instances should work. However, it requires an expensive ToR switch,
    supporting VLAN/VLAN and VLAN/VXLAN rewriting, and, of course ML2 plugin
    support. Also, reusing an existing segmentation ID requires more work: we have
    no current way to put the right ID in the configdrive.
         **vsaienko**, **armando** and **kevinbenton** are looking into the Neutron
         part of the configdrive problem.
    Routed networks support require Ironic to be aware of which physical network(s)
    each node is connected to.
    **Goals for Pike**
         * model physical networks on Ironic ports,
         * update VIF attach logic to no longer attach things to wrong physnets.
    We discussed introducing notifications from Neutron to Ironic about events
    of interest for us. We are going to use the same model as between Neutron and
    Nova: create a Neutron plugin that filters out interesting events and posts
    to a new Ironic API endpoint.
    **Goals for Pike**
         have this notification system in place.
    Finally, we agreed that we need to work on a reference architecture document,
    describing the best practices of deploying Ironic, especially around
    multi-tenant networking setup.
         **jroll** to kickstart this document, **JayF** and **mariojv** to help.
    Rolling upgrades
    Missed Ocata by a small margin. The code is up and needs reviewing. The CI
    is waiting for the multinode job to start working (should be close as well).
    **Goals for Pike**
         rolling upgrade Ocata -> Pike.
    Driver composition reform
    Most of the code landed in Ocata already. Some client changes landed in Pike,
    some are still on review. As we released Ocata with the driver composition
    changes being experimental, we are not ready to deprecate old-style drivers in
    Pike. Documentation is also still lacking.
    **Goals for Pike**
         * make new-style dynamic drivers the recommend way of writing and using
         * fill in missing documentation,
         * *recommend* vendors to have hardware types for their hardware, as well
           as 3rdparty CI support for it.
    **Important decisions**
         * no new classic drivers are accepted in-tree (please check when accepting
         * no new interfaces additions for classic drivers(``volume_interface`` is
           the last accepted from them),
         * remove the SSH drivers by Pike final (probably around M3).
    Ironic Inspector HA
    Preliminary work (switch to a real state machine) done in Ocata. Splitting the
    service into API and conductor/engine parts correlates with the WSGI
    cross-project goal.
    We also had a deeper discussion about ironic-inspector architecture earlier
    that week, where we were `looking
    <https://etherpad.openstack.org/p/ironic-pike-ptg-inspector-arch>`_ into
    potential future work to make ironic-inspector both HA and multi-tenancy
    friendly. It was suggested to split *discovery* process (simple process to
    detect MACs and/or power credentials) and *inspection* process (full process
    when a MAC is known).
    **Goals for Pike**
         * switch locking to ``tooz`` (with Redis probably being the default
           backend for now),
         * split away API process with WSGI support,
         * leader election using ``tooz`` for periodic tasks,
         * stop messing with ``iptables`` and start directly managing ``dnsmasq``
           instead (similarly to how Neutron does it),
         * try using ``dnsmasq`` in active/active configuration with
           non-intersecting IP addresses pools from the same subnet.
         also **sambetts** will write a spec on a potential workflow split.
    Ironic UI
    The project got some important features implemented, and an RDO package
    emerged during Ocata. Still, it desperately needs volunteers for coding and
    testing. A `spreadsheet
    captures the current (as of beginning of Pike) status of features.
         **dtantsur**, **davidlenwell**, **bradjones** and **crushil** agreed to
         dedicate some time to the UI.
    Most of the patches are up, the feature is tested with the CoreOS-based
    ramdisk for now. Still, the ramdisk side poses a problem: while using DHCP is
    easy, static network configuration seems not. It's especially problematic in
    CoreOS. Might be much easier in the DIB-based ramdisk, but we don't support it
    officially in the Ironic community.
    RedFish driver
    We want to get a driver supporting RedFish soon. There was some critics raised
    around the currently proposed python-redfish library. As an alternative,
    `a new library <https://github.com/openstack/sushy>`_ was written. Is it
    lightweight, covered by unit tests and only contain what Ironic needs.
    We agreed to start our driver implementation with it, and switch to the
    python-redfish library when/if it is ready to be consumed by us.
    We postponed discussing advanced features like nodes composition till after
    we get the basic driver in.
    Small status updates
    * Of the API evolution initiative, only E-Tag work got some progress. The spec
       needs reviewing now.
    * Node tags work needs review and is close to landing. We decided to discuss
       port tags as part of a separate RFE, if anybody is interested.
    * IPA API versioning also needs reviews, there are several moderately
       contentions points about it. It was suggested that we only support one
       direction of IPA/ironic upgrades to simplify testing. We'll probably only
       support old IPA with new ironic, which is already tested by our grenade job.
    CI and testing
    Etherpad: https://etherpad.openstack.org/p/ironic-pike-ptg-ci-testing
    Missing CI coverage
         Cirros finally released a stable version with UEFI support built in.
         A non-voting job is running with partition images, should be made voting
         soon. A test with whole disk images will be introduced as part of
         `standalone tests <https://review.openstack.org/#/c/423556/>`_.
    Local bootloader
         Requires small enough instance images with Grub2 present (Cirros does not
         have it). We agreed to create a new repository with scripts to build
         suitable images. Potentially can be shared with other teams (e.g. Neutron).
         Actions: **lucasagomes** and/or **vsaienko** to look into it.
    Adopt state
         Tests have been up for some time, but have ordering issues with nova-based
         tests. Suggesting **TheJulia** to move them to `standalone tests`_.
    Root device hints
         Not covered by any CI. Will need modifying how we create virtual machines.
         First step is to get size-based hints work. Check two cases: with size
         strictly equal and greater than requested.
         Actions: **dtantsur** to look into it.
    Capabilities-based scheduling
         This may actually go to Nova gate, not ours. Still, it relies on some code
         in our driver, so we'd better cover it to ensure that the placement API
         changes don't break it.
         Actions: **vsaienko** to look into it.
    Port groups
         The same image problem as with local boot - the same action item to create
         a repository with build scripts to build our images.
    VLAN-aware instances
         The same image problem + requires `reworking our network simulation code
    Conductor take over and hash ring
         Requires a separate multi-node job.
         Action: **vsaienko** to investigate.
    DIB-based IPA image
    Currently the ``ironic-agent`` element to build such image is in the DIB
    repository outside of our control. If we want to properly support it, we need
    to gate on its changes, and to gate IPA changes on its job. Some time ago we
    had a tentative agreement to move the element to our tree.
    It was blocked by the fact that DIB rarely or never removes elements, and does
    not have a way to properly de-duplicate elements with the same name.
    An obvious solution we are going to propose is to take this element in IPA
    tree under a different name (``ironic-python-agent``?). The old element will
    get deprecated and only critical fixes will be accepted for it.
         **dtantsur** to (re)start this discussion with the TripleO and DIB teams.
    API microversions testing
    We are not sure we have tests covering all microversions. We seem to have API
    tests using ``fake`` driver that cover at least some of them. We should start
    paying more attention to this part of our testing.
         **dtantsur** to check if these tests are up-to-date and split them to a
         separate CI job.
         **pas-ha** to write API tests for internal API (i.e. lookup/heartbeat).
    Global OpenStack goals
    Splitting away tempest plugins
    It did not end up a goal for Pike, and there are still some concerns in the
    community. Still, as we already apply ugly hacks in our jobs to use the
    tempest plugin from master, we agreed to proceed with the split.
    To simplify both maintenance and consuming our tests, we agreed to merge
    ironic and ironic-inspector plugins. The introspection tests will or will
    not run based on ironic-inspector presence.
    We propose having a merged core team (i.e. ironic-inspector-core which
    already includes ironic-core) for this repository. We trust people who
    only have core rights on ironic-inspector to not approve things they're
    not authorized to approve.
    Python 3 support
    We've been running Python 3 unit tests for quite some time. Additionally,
    ironic-inspector runs a non-voting Python 3 functional test. Ironic has an
    experimental job which fails, apparently, because of swift. We can start with
    switching this job to the ``pxe_ipmitool`` driver (not requiring swift).
    Inspector does not have a Python 3 integration tests job proposed yet.
         **JayF** and **hurricanerix** will drive this work in both ironic and
         **lucasagomes** to check pyghmi and virtualbmc compatibility.
         **krtaylor** and/or **mjturek** to check MoltenIron.
    We agreed that Bifrost is out of scope for this task. Its Python 3
    compatibility mostly depends on one of Ansible anyway. Similarly, for the UI
    we need horizon to be fully Python 3 compatible first.
    Important decisions
         We recommend vendors to make their libraries compatible with Python 3.
         It may become a strict requirement in one of the coming releases.
    API behind WSGI container
    This seems quite straightforward. The work has started to switch ironic CI to
    WSGI already. For ironic-inspector it's going to be done as part of the HA
    Etherpad: https://etherpad.openstack.org/p/ironic-pike-ptg-operations
    OSC plugin and API versioning
    Currently we default the OSC plugin (and old client too) to a really old API
    version. We agreed that this situation is not desired, and that we should take
    the same approach as Nova and default to the latest version. We are planning
    to announce the change this cycle, both via the ML and via a warning issues
    when no versions are specified.
    Next, in the Queens cycle, we will have to make the change, bearing in mind
    that OSC does not support values like ``latest`` for API versions. So the plan
    is as follows:
    * make the default ``--os-baremetal-api-version=1`` in
    * when instantiating the ironic client in the OSC plugin, replace '1' with
    * when handling ``--os-baremetal-api-version=latest``, replace it with ``1``,
       so that it's later replaced with ``latest`` again:
    As a side effect, that will make ``1`` equivalent to ``latest`` as well.
    It was also suggested to have an new command, displaying both server supported
    and client supported API versions.
    Deprecating the standalone ironic CLI in favor of the OSC plugin
    We do not want to maintain two CLI in the long run. We agreed to start
    thinking about deprecating the old ``ironic`` command. Main concerns:
    * lack of feature parity,
    * ugly way to work without authentication, for example::
         openstack baremetal --os-url http://ironic --os-token fake <COMMAND>
    Plan for Pike
         * Ensure complete feature parity between two clients.
         * Only use ``openstack baremetal`` commands in the documentation.
    The actual deprecation is planned for Queens.
    RAID configuration enhancements
    A few suggestions were made:
    * Support ordered list of logical disk definition. The first possible
       configuration is applied to the node. For example:
       * Top of list - RAID 10 but we don't have enough drives
       * Fallback to next preference in list - RAID 1 on a pair of available drives
       * Finally, JBOD or RAID 0 on only available drive
    * Specify the number of instances for a logical disk definition to create.
    * Specify backing physical disks by stating preference for the smallest, e.g.
       smallest like-sized pair or two smallest disks.
    * Specify location of physical disks, e.g. first two or last two as perceived
       by the hardware, front/rear/internal location.
         **rpioso** will write RFE(s)
    Smaller topics
    Non-aborteable clean steps stuck in ``clean wait`` state
         We discussed a potential ``force-abort`` functionality, but the only thing
         we agreed on is check that all current clean steps are marked as
         ``abortable`` if they really are.
    Status of long-running cleaning operations
         There is a request to be able to get status of e.g. disk shredding (which
         may take hours). We found out that the current IPA API design essentially
         prevents running several commands in parallel. We agreed that we need IPA
         API versioning first, and that this work is not a huge priority right now.
    OSC command for listing driver and RAID properties
         We cannot agree on the exact form of these two commands. The primary
         candidates discussed on the PTG were::
             openstack baremetal driver property list <DRIVER>
             openstack baremetal driver property show <DRIVER>
         We agreed to move this to the spec: https://review.openstack.org/439907.
    Abandoning an active node
         I.e. an opposite to adopt. It's unclear how such operation would play with
         nova, maybe it's only useful for a standalone case.
    Future Work
    Etherpad: https://etherpad.openstack.org/p/ironic-pike-ptg-future-work.
    Neutron event processing
    RFE: https://bugs.launchpad.net/ironic/+bug/1304673, spec:
    We need to wait for certain events from neutron (like port bindings).
    Currently we just wait some time, and hope it went well. We agreed to follow
    the same pattern that nova does for neutron to nova notifications.
    The neutron part is
    We agreed with the Neutron team that notifier and the other ironic-specific
    stuff for neutron would live in a separate repo under Baremetal governance.
    Draft code is https://review.openstack.org/#/c/357780.
    Splitting node.properties[capabilities] into a separate table
    This is something we've planned on for long time. Currently, it's not possible
    to update capabilities atomically, and the format is quite hard to work with:
    ``k1:v1,k2:v2``. We discussed going away from using word ``capability``. It's
    already overused in the OpenStack world, and nova is switching to the notion
    of "traits". It also looks like traits will be qualitative-only while, we have
    proposals from quantitative capabilities (like ``gpu_count``).
    It was proposed to model a typical CRUD API for traits in Ironic::
         GET /v1/nodes/<NODE>/traits
         POST  /v1/nodes/<NODE>/traits
         GET /v1/nodes/<NODE>/traits/<trait>
         DELETE /v1/nodes/<NODE>/traits/<trait>
    In API versions before this addition, we would make
    ``properties/capabilities`` a transparent proxy to new tables.
    It was noted that the database change can be done first, with API change
    following it.
         **rloo** to propose two separate RFEs for database and API parts.
    Avoid changing behavior based on properties[capabilities]
    Currently our capabilities have a dual role. They serve both for scheduling
    (to inform nova of what nodes can) and for making decisions based on flavor
    (e.g. request UEFI boot). It is complicated by the fact that sometimes the
    same capability (e.g. UEFI) can be of both types depending on a driver.
    This is quite confusing for users, and may be incompatible with future changes
    both in ironic and nova.
    For things like boot option and (potentially) BIOS setting, we need to be able
    to get requests from flavors and/or nova boot arguments without abusing
    capabilities for it. Maybe similar to how NUMA support does it:
    For example::
    (tells the scheduler to find a node with SSD disk; does not change
    behavior/config of node)
    (configures the node to boot UEFI; has no impact on scheduling)
    (tells the scheduler to find a node supporting UEFI; if this support is
    dynamic, configures the node to enable UEFI boot).
         **jroll** to start conversation with nova folks about how/if to have a
         replacement for this elsewhere.
         Stop accepting driver features relying on ``properties[capabilities]`` (as
         opposed to ``instance_info[capabilities]``).
    Potential actions
         * Remove ``instance_info[capabilities]`` into
           ``instance_info[configuration]`` for clarity.
    Deploy-time RAID
    This was discussed on the last design summit. Since then we've got a `nova
    spec <https://review.openstack.org/408151>`_, which, however, hasn't got many
    reviews so far. The spec continues using ``block_device_mapping_v2``, other
    options apparently were not considered.
    We discussed how to inform Nova whether or not RAID can be built for
    a particular node. Ideally, we need to tell the scheduler about many things:
    RAID support, disk number, disk sizes. We decided that it's an overkill, at
    least for the beginning. We'll only rely on a "supports RAID" trait for now.
    It's still unclear what to do about ``local_gb`` property, but with planned
    Nova changes it may not be required any more.
    Advanced partitioning
    There is a desire for flexible partitioning in ironic, both in case of
    partition and whole disk images (in the latter case - partition other disks).
    Generally, there was no consensus on the PTG. Some people were very much in
    favor of this feature, some - quite against. It's unclear how to pass
    partitioning information from Nova. There is a concern that such feature will
    get us too much into OS-specific details. We agreed that someone interested
    will collect the requirements, create a more detailed proposal, and we'll
    discuss it on the next PTG.
    Splitting nodes into separate pools
    This feature is about dedicating some nodes to a tenant, essentially adding a
    tenant_id field to nodes. This can be helpful e.g. for a hardware provider to
    reserve hardware for a tenant, so that it's always available.
    This seems relatively easy to implement in Ironic. We need a new field on
    nodes, then only show non-admin users their hardware. A bit trickier to make
    it work with Nova. We agreed to investigate passing a token from Nova to
    Ironic, as opposed to always using a service user admin token.
         **vdrok** to work out the details and propose a spec.
    Requirements for routed networks
    We discussed requirements for achieving routed architecture like
    spine-and-leaf. It seems that most of the requirements are already in our
    plans. The outstanding items are:
    * Multiple subnets support for ironic-inspector. Can be solved in
       ``dnsmasq.conf`` level, an appropriate change was merged into
    * Per-node provision and cleaning networks. There is an RFE, somebody just
       has to do the work.
    This does not seem to be a Pike goal for us, but many of the dependencies
    are planned for Pike.
    Configuring BIOS setting for nodes
    Preparing a node to be configured to serve a certain rule by tweaking its
    settings. Currently, it is implemented by the Drac driver in a vendor pass-thru.
    We agreed that such feature would better fit cleaning, rather then
    pre-deployment. Thus, it does not depend on deploy steps. It was suggested to
    extend the management interface to support passing it an arbitrary JSON with
    configuration. Then a clean step would pick it (similar to RAID).
         **rpioso** to write a spec for this feature.
    Deploy steps
    We discussed `the deploy steps proposal <https://review.openstack.org/412523>`_
    in depth. We agreed on partially splitting the deployment procedure into
    pluggable bits. We will leave the very core of the deployment - flashing the
    image onto a target disk - hardcoded, at least for now. The drivers will be
    able to define steps to run before and after this core deployment. Pre- and
    post-deployment steps will have different priorities ranges, something like::
         0 < pre-max/deploy-min < deploy-max/post-min < infinity
    We plan on making partitioning a pre-deploy step, and installing a bootloader
    a post-deploy step. We will not allow IPA hardware managers to define deploy
    steps, at least for now.
         **yolanda** is planning to work on this feature, **rloo** and **TheJulia**
         to help.
    Authenticating IPA
    IPA HTTP endpoints, and the endpoints Ironic provides for ramdisk callbacks
    are completely insecure right now. We hesitated to add any authentication to
    them, as any secrets published for the ramdisk to use (be it part of kernel
    command line or image itself) are readily available to anyone on the network.
    We agreed on several things to look into:
    * A random CSRF-like token to use for each node. This will somewhat limit the
       attack surface by requiring an attacker to intercept a token for the
       specific node, rather then just access the endpoints.
    * Document splitting out public and private Ironic API as part of our future
       reference architecture guide.
    * Make sure we support TLS between Ironic and IPA, which is particularly
       helpful when virtual media is used (and secrets are not leaked).
         **jroll** and **joanna** to look into the random token idea.
         **jroll** to write an RFE for TLS between IPA and Ironic.
    Smaller things
    Using ansible-networking as a ML2 driver for ironic-neutron integration work
         It was suggested to make it one of backends for
         ``networking-generic-switch`` in addition to ``netmiko``. Potential
         concurrency issues when using SSH were raised, and still require a solution.
    Extending and standardizing the list of capabilities the drivers may discover
         It was proposed to use `os-traits <https://github.com/jaypipes/os-traits>`_
         for standardizing qualitative capabilities. **jroll** will look into
         quantitative capabilities.
    Pluggable interface for long-running processes
         This was proposed as an optional way to mitigate certain problems with
         local long-running services, like console. E.g. if a conductor crashes,
         its console services keep running. It was noted that this is a bug to be
         fixed (**TheJulia** volunteered to triage it).
         The proposed solution involved optionally run processes on a remote
         cluster, e.g. k8s. Concerns were voiced on the PTG around complicating
         support matrix and adding more decisions to make for operators.
         There was no apparent consensus on implementing this feature due to that.
    Setting specific boot device for PXE booting
         It was found to be already solved by setting ``pxe_enabled`` on ports.
         We just need to update ironic-inspector to set this flag.
    Priorities and planning
    The suggested priorities list is now finalized in
    We also agreed on the following priorities for ironic-inspector subteam:
    * Inspector HA (**milan**)
    * Community goal - python 3.5 (**JayF**, **hurricanerix**)
    * Community goal - devstack+apache+wsgi (**aarefiev**, **ovoshchana**)
    * Inspector needs to update ``pxe_enabled`` flag on ports (**dtantsur**)
    OpenStack Development Mailing List (not for usage questions)
    Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe

More information about the OpenStack-dev mailing list