[openstack-dev] [ironic] Pike PTG report

Dmitry Tantsur dtantsur at redhat.com
Wed Mar 8 15:06:59 UTC 2017


Hi all!

I've finished my Pike PTG report. It is spread over four blog posts:

http://dtantsur.github.io/posts/ironic-ptg-atlanta-2017-1.html
http://dtantsur.github.io/posts/ironic-ptg-atlanta-2017-2.html
http://dtantsur.github.io/posts/ironic-ptg-atlanta-2017-3.html
http://dtantsur.github.io/posts/ironic-ptg-atlanta-2017-4.html

It was a lot of typing, please pardon mistakes. The whole text (in RST format) 
for archiving purposes is copy-pasted in the end of this message.

Please feel free to respond here or in the blog comments.

Cheers,
Dmitry


Ongoing work and status updates
-------------------------------

Etherpad: https://etherpad.openstack.org/p/ironic-pike-ptg-ongoing-work.

We spent the first half of Wednesday discussing this. There was a lot of
incomplete work left from Ocata, and some major ongoing work that we did not
even plan to finish in Ocata.

Boot-from-volume
~~~~~~~~~~~~~~~~

Got some progress, most of the Ironic patches are up. Desperately needs review
and testing, though. The Nova part is also lagging behind, and should be
brought to the Nova team attention.

**Actions**
     **mgoddard** and **dtantsur** volunteered to help with testing, while
     **mjturek**, **hsiina** and **crushil** volunteered to do some coding.
**Goals for Pike**
     finish the first (iSCSI using iPXE) case and the Nova part.

Networking
~~~~~~~~~~

A lot of progress here during Ocata, completed bonding and attach/detach API.

VLAN-aware instances should work. However, it requires an expensive ToR switch,
supporting VLAN/VLAN and VLAN/VXLAN rewriting, and, of course ML2 plugin
support. Also, reusing an existing segmentation ID requires more work: we have
no current way to put the right ID in the configdrive.

**Actions**
     **vsaienko**, **armando** and **kevinbenton** are looking into the Neutron
     part of the configdrive problem.

Routed networks support require Ironic to be aware of which physical network(s)
each node is connected to.

**Goals for Pike**
     * model physical networks on Ironic ports,
     * update VIF attach logic to no longer attach things to wrong physnets.

We discussed introducing notifications from Neutron to Ironic about events
of interest for us. We are going to use the same model as between Neutron and
Nova: create a Neutron plugin that filters out interesting events and posts
to a new Ironic API endpoint.

**Goals for Pike**
     have this notification system in place.

Finally, we agreed that we need to work on a reference architecture document,
describing the best practices of deploying Ironic, especially around
multi-tenant networking setup.

**Actions**
     **jroll** to kickstart this document, **JayF** and **mariojv** to help.

Rolling upgrades
~~~~~~~~~~~~~~~~

Missed Ocata by a small margin. The code is up and needs reviewing. The CI
is waiting for the multinode job to start working (should be close as well).

**Goals for Pike**
     rolling upgrade Ocata -> Pike.

Driver composition reform
~~~~~~~~~~~~~~~~~~~~~~~~~

Most of the code landed in Ocata already. Some client changes landed in Pike,
some are still on review. As we released Ocata with the driver composition
changes being experimental, we are not ready to deprecate old-style drivers in
Pike. Documentation is also still lacking.

**Goals for Pike**
     * make new-style dynamic drivers the recommend way of writing and using
       drivers,
     * fill in missing documentation,
     * *recommend* vendors to have hardware types for their hardware, as well
       as 3rdparty CI support for it.
**Important decisions**
     * no new classic drivers are accepted in-tree (please check when accepting
       specifications),
     * no new interfaces additions for classic drivers(``volume_interface`` is
       the last accepted from them),
     * remove the SSH drivers by Pike final (probably around M3).

Ironic Inspector HA
~~~~~~~~~~~~~~~~~~~

Preliminary work (switch to a real state machine) done in Ocata. Splitting the
service into API and conductor/engine parts correlates with the WSGI
cross-project goal.

We also had a deeper discussion about ironic-inspector architecture earlier
that week, where we were `looking
<https://etherpad.openstack.org/p/ironic-pike-ptg-inspector-arch>`_ into
potential future work to make ironic-inspector both HA and multi-tenancy
friendly. It was suggested to split *discovery* process (simple process to
detect MACs and/or power credentials) and *inspection* process (full process
when a MAC is known).

**Goals for Pike**
     * switch locking to ``tooz`` (with Redis probably being the default
       backend for now),
     * split away API process with WSGI support,
     * leader election using ``tooz`` for periodic tasks,
     * stop messing with ``iptables`` and start directly managing ``dnsmasq``
       instead (similarly to how Neutron does it),
     * try using ``dnsmasq`` in active/active configuration with
       non-intersecting IP addresses pools from the same subnet.
**Actions**
     also **sambetts** will write a spec on a potential workflow split.

Ironic UI
~~~~~~~~~

The project got some important features implemented, and an RDO package
emerged during Ocata. Still, it desperately needs volunteers for coding and
testing. A `spreadsheet
<https://docs.google.com/spreadsheets/d/1petifqVxOT70H2Krz7igV2m9YqgXaAiCHR8CXgoi9a0/edit?usp=sharing>`_
captures the current (as of beginning of Pike) status of features.

**Actions**
     **dtantsur**, **davidlenwell**, **bradjones** and **crushil** agreed to
     dedicate some time to the UI.

Rescue
~~~~~~

Most of the patches are up, the feature is tested with the CoreOS-based
ramdisk for now. Still, the ramdisk side poses a problem: while using DHCP is
easy, static network configuration seems not. It's especially problematic in
CoreOS. Might be much easier in the DIB-based ramdisk, but we don't support it
officially in the Ironic community.

RedFish driver
~~~~~~~~~~~~~~

We want to get a driver supporting RedFish soon. There was some critics raised
around the currently proposed python-redfish library. As an alternative,
`a new library <https://github.com/openstack/sushy>`_ was written. Is it
lightweight, covered by unit tests and only contain what Ironic needs.
We agreed to start our driver implementation with it, and switch to the
python-redfish library when/if it is ready to be consumed by us.

We postponed discussing advanced features like nodes composition till after
we get the basic driver in.

Small status updates
~~~~~~~~~~~~~~~~~~~~

* Of the API evolution initiative, only E-Tag work got some progress. The spec
   needs reviewing now.

* Node tags work needs review and is close to landing. We decided to discuss
   port tags as part of a separate RFE, if anybody is interested.

* IPA API versioning also needs reviews, there are several moderately
   contentions points about it. It was suggested that we only support one
   direction of IPA/ironic upgrades to simplify testing. We'll probably only
   support old IPA with new ironic, which is already tested by our grenade job.

CI and testing
--------------

Etherpad: https://etherpad.openstack.org/p/ironic-pike-ptg-ci-testing

Missing CI coverage
~~~~~~~~~~~~~~~~~~~

UEFI
     Cirros finally released a stable version with UEFI support built in.
     A non-voting job is running with partition images, should be made voting
     soon. A test with whole disk images will be introduced as part of
     `standalone tests <https://review.openstack.org/#/c/423556/>`_.
Local bootloader
     Requires small enough instance images with Grub2 present (Cirros does not
     have it). We agreed to create a new repository with scripts to build
     suitable images. Potentially can be shared with other teams (e.g. Neutron).

     Actions: **lucasagomes** and/or **vsaienko** to look into it.
Adopt state
     Tests have been up for some time, but have ordering issues with nova-based
     tests. Suggesting **TheJulia** to move them to `standalone tests`_.
Root device hints
     Not covered by any CI. Will need modifying how we create virtual machines.
     First step is to get size-based hints work. Check two cases: with size
     strictly equal and greater than requested.

     Actions: **dtantsur** to look into it.
Capabilities-based scheduling
     This may actually go to Nova gate, not ours. Still, it relies on some code
     in our driver, so we'd better cover it to ensure that the placement API
     changes don't break it.

     Actions: **vsaienko** to look into it.
Port groups
     The same image problem as with local boot - the same action item to create
     a repository with build scripts to build our images.
VLAN-aware instances
     The same image problem + requires `reworking our network simulation code
     <https://review.openstack.org/#/c/392959/>`_.
Conductor take over and hash ring
     Requires a separate multi-node job.

     Action: **vsaienko** to investigate.

DIB-based IPA image
^^^^^^^^^^^^^^^^^^^

Currently the ``ironic-agent`` element to build such image is in the DIB
repository outside of our control. If we want to properly support it, we need
to gate on its changes, and to gate IPA changes on its job. Some time ago we
had a tentative agreement to move the element to our tree.

It was blocked by the fact that DIB rarely or never removes elements, and does
not have a way to properly de-duplicate elements with the same name.

An obvious solution we are going to propose is to take this element in IPA
tree under a different name (``ironic-python-agent``?). The old element will
get deprecated and only critical fixes will be accepted for it.

Action
     **dtantsur** to (re)start this discussion with the TripleO and DIB teams.

API microversions testing
^^^^^^^^^^^^^^^^^^^^^^^^^

We are not sure we have tests covering all microversions. We seem to have API
tests using ``fake`` driver that cover at least some of them. We should start
paying more attention to this part of our testing.

Actions
     **dtantsur** to check if these tests are up-to-date and split them to a
     separate CI job.
     **pas-ha** to write API tests for internal API (i.e. lookup/heartbeat).

Global OpenStack goals
~~~~~~~~~~~~~~~~~~~~~~

Splitting away tempest plugins
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

It did not end up a goal for Pike, and there are still some concerns in the
community. Still, as we already apply ugly hacks in our jobs to use the
tempest plugin from master, we agreed to proceed with the split.

To simplify both maintenance and consuming our tests, we agreed to merge
ironic and ironic-inspector plugins. The introspection tests will or will
not run based on ironic-inspector presence.

We propose having a merged core team (i.e. ironic-inspector-core which
already includes ironic-core) for this repository. We trust people who
only have core rights on ironic-inspector to not approve things they're
not authorized to approve.

Python 3 support
^^^^^^^^^^^^^^^^

We've been running Python 3 unit tests for quite some time. Additionally,
ironic-inspector runs a non-voting Python 3 functional test. Ironic has an
experimental job which fails, apparently, because of swift. We can start with
switching this job to the ``pxe_ipmitool`` driver (not requiring swift).
Inspector does not have a Python 3 integration tests job proposed yet.

Actions
     **JayF** and **hurricanerix** will drive this work in both ironic and
     ironic-inspector.

     **lucasagomes** to check pyghmi and virtualbmc compatibility.

     **krtaylor** and/or **mjturek** to check MoltenIron.

We agreed that Bifrost is out of scope for this task. Its Python 3
compatibility mostly depends on one of Ansible anyway. Similarly, for the UI
we need horizon to be fully Python 3 compatible first.

Important decisions
     We recommend vendors to make their libraries compatible with Python 3.
     It may become a strict requirement in one of the coming releases.

API behind WSGI container
^^^^^^^^^^^^^^^^^^^^^^^^^

This seems quite straightforward. The work has started to switch ironic CI to
WSGI already. For ironic-inspector it's going to be done as part of the HA
work.

Operations
----------

Etherpad: https://etherpad.openstack.org/p/ironic-pike-ptg-operations

OSC plugin and API versioning
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Currently we default the OSC plugin (and old client too) to a really old API
version. We agreed that this situation is not desired, and that we should take
the same approach as Nova and default to the latest version. We are planning
to announce the change this cycle, both via the ML and via a warning issues
when no versions are specified.

Next, in the Queens cycle, we will have to make the change, bearing in mind
that OSC does not support values like ``latest`` for API versions. So the plan
is as follows:

* make the default ``--os-baremetal-api-version=1`` in
 
https://github.com/openstack/python-ironicclient/blob/f242c6af3b295051019aeabb4ec7cf82eb085874/ironicclient/osc/plugin.py#L67

* when instantiating the ironic client in the OSC plugin, replace '1' with
   'latest':
 
https://github.com/openstack/python-ironicclient/blob/f242c6af3b295051019aeabb4ec7cf82eb085874/ironicclient/osc/plugin.py#L41

* when handling ``--os-baremetal-api-version=latest``, replace it with ``1``,
   so that it's later replaced with ``latest`` again:
 
https://github.com/openstack/python-ironicclient/blob/f242c6af3b295051019aeabb4ec7cf82eb085874/ironicclient/osc/plugin.py#L85

As a side effect, that will make ``1`` equivalent to ``latest`` as well.

It was also suggested to have an new command, displaying both server supported
and client supported API versions.

Deprecating the standalone ironic CLI in favor of the OSC plugin
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

We do not want to maintain two CLI in the long run. We agreed to start
thinking about deprecating the old ``ironic`` command. Main concerns:

* lack of feature parity,

* ugly way to work without authentication, for example::

     openstack baremetal --os-url http://ironic --os-token fake <COMMAND>

Plan for Pike
     * Ensure complete feature parity between two clients.
     * Only use ``openstack baremetal`` commands in the documentation.

The actual deprecation is planned for Queens.

RAID configuration enhancements
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

A few suggestions were made:

* Support ordered list of logical disk definition. The first possible
   configuration is applied to the node. For example:

   * Top of list - RAID 10 but we don't have enough drives
   * Fallback to next preference in list - RAID 1 on a pair of available drives
   * Finally, JBOD or RAID 0 on only available drive

* Specify the number of instances for a logical disk definition to create.

* Specify backing physical disks by stating preference for the smallest, e.g.
   smallest like-sized pair or two smallest disks.

* Specify location of physical disks, e.g. first two or last two as perceived
   by the hardware, front/rear/internal location.

Actions
     **rpioso** will write RFE(s)

Smaller topics
~~~~~~~~~~~~~~

Non-aborteable clean steps stuck in ``clean wait`` state
     We discussed a potential ``force-abort`` functionality, but the only thing
     we agreed on is check that all current clean steps are marked as
     ``abortable`` if they really are.

Status of long-running cleaning operations
     There is a request to be able to get status of e.g. disk shredding (which
     may take hours). We found out that the current IPA API design essentially
     prevents running several commands in parallel. We agreed that we need IPA
     API versioning first, and that this work is not a huge priority right now.

OSC command for listing driver and RAID properties
     We cannot agree on the exact form of these two commands. The primary
     candidates discussed on the PTG were::

         openstack baremetal driver property list <DRIVER>
         openstack baremetal driver property show <DRIVER>

     We agreed to move this to the spec: https://review.openstack.org/439907.

Abandoning an active node
     I.e. an opposite to adopt. It's unclear how such operation would play with
     nova, maybe it's only useful for a standalone case.

Future Work
-----------

Etherpad: https://etherpad.openstack.org/p/ironic-pike-ptg-future-work.

Neutron event processing
~~~~~~~~~~~~~~~~~~~~~~~~

RFE: https://bugs.launchpad.net/ironic/+bug/1304673, spec:
https://review.openstack.org/343684.

We need to wait for certain events from neutron (like port bindings).
Currently we just wait some time, and hope it went well. We agreed to follow
the same pattern that nova does for neutron to nova notifications.
The neutron part is
https://github.com/openstack/neutron/blob/master/neutron/notifiers/nova.py.
We agreed with the Neutron team that notifier and the other ironic-specific
stuff for neutron would live in a separate repo under Baremetal governance.
Draft code is https://review.openstack.org/#/c/357780.

Splitting node.properties[capabilities] into a separate table
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

This is something we've planned on for long time. Currently, it's not possible
to update capabilities atomically, and the format is quite hard to work with:
``k1:v1,k2:v2``. We discussed going away from using word ``capability``. It's
already overused in the OpenStack world, and nova is switching to the notion
of "traits". It also looks like traits will be qualitative-only while, we have
proposals from quantitative capabilities (like ``gpu_count``).

It was proposed to model a typical CRUD API for traits in Ironic::

     GET /v1/nodes/<NODE>/traits
     POST  /v1/nodes/<NODE>/traits
     GET /v1/nodes/<NODE>/traits/<trait>
     DELETE /v1/nodes/<NODE>/traits/<trait>

In API versions before this addition, we would make
``properties/capabilities`` a transparent proxy to new tables.

It was noted that the database change can be done first, with API change
following it.

Actions
     **rloo** to propose two separate RFEs for database and API parts.

Avoid changing behavior based on properties[capabilities]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Currently our capabilities have a dual role. They serve both for scheduling
(to inform nova of what nodes can) and for making decisions based on flavor
(e.g. request UEFI boot). It is complicated by the fact that sometimes the
same capability (e.g. UEFI) can be of both types depending on a driver.
This is quite confusing for users, and may be incompatible with future changes
both in ironic and nova.

For things like boot option and (potentially) BIOS setting, we need to be able
to get requests from flavors and/or nova boot arguments without abusing
capabilities for it. Maybe similar to how NUMA support does it:
https://docs.openstack.org/admin-guide/compute-cpu-topologies.html.

For example::

     flavor.extra_specs[traits:has_ssd]=True

(tells the scheduler to find a node with SSD disk; does not change
behavior/config of node)

::

     flavor.extra_specs[configuration:use_uefi]=True

(configures the node to boot UEFI; has no impact on scheduling)

::

     flavor.extra_specs[traits:has_uefi]=True
     flavor.extra_specs[configuration:use_uefi]=True

(tells the scheduler to find a node supporting UEFI; if this support is
dynamic, configures the node to enable UEFI boot).

Actions
     **jroll** to start conversation with nova folks about how/if to have a
     replacement for this elsewhere.

     Stop accepting driver features relying on ``properties[capabilities]`` (as
     opposed to ``instance_info[capabilities]``).

Potential actions
     * Remove ``instance_info[capabilities]`` into
       ``instance_info[configuration]`` for clarity.

Deploy-time RAID
~~~~~~~~~~~~~~~~

This was discussed on the last design summit. Since then we've got a `nova
spec <https://review.openstack.org/408151>`_, which, however, hasn't got many
reviews so far. The spec continues using ``block_device_mapping_v2``, other
options apparently were not considered.

We discussed how to inform Nova whether or not RAID can be built for
a particular node. Ideally, we need to tell the scheduler about many things:
RAID support, disk number, disk sizes. We decided that it's an overkill, at
least for the beginning. We'll only rely on a "supports RAID" trait for now.

It's still unclear what to do about ``local_gb`` property, but with planned
Nova changes it may not be required any more.

Advanced partitioning
~~~~~~~~~~~~~~~~~~~~~

There is a desire for flexible partitioning in ironic, both in case of
partition and whole disk images (in the latter case - partition other disks).
Generally, there was no consensus on the PTG. Some people were very much in
favor of this feature, some - quite against. It's unclear how to pass
partitioning information from Nova. There is a concern that such feature will
get us too much into OS-specific details. We agreed that someone interested
will collect the requirements, create a more detailed proposal, and we'll
discuss it on the next PTG.

Splitting nodes into separate pools
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

This feature is about dedicating some nodes to a tenant, essentially adding a
tenant_id field to nodes. This can be helpful e.g. for a hardware provider to
reserve hardware for a tenant, so that it's always available.

This seems relatively easy to implement in Ironic. We need a new field on
nodes, then only show non-admin users their hardware. A bit trickier to make
it work with Nova. We agreed to investigate passing a token from Nova to
Ironic, as opposed to always using a service user admin token.

Actions
     **vdrok** to work out the details and propose a spec.

Requirements for routed networks
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

We discussed requirements for achieving routed architecture like
spine-and-leaf. It seems that most of the requirements are already in our
plans. The outstanding items are:

* Multiple subnets support for ironic-inspector. Can be solved in
   ``dnsmasq.conf`` level, an appropriate change was merged into
   puppet-ironic.

* Per-node provision and cleaning networks. There is an RFE, somebody just
   has to do the work.

This does not seem to be a Pike goal for us, but many of the dependencies
are planned for Pike.

Configuring BIOS setting for nodes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Preparing a node to be configured to serve a certain rule by tweaking its
settings. Currently, it is implemented by the Drac driver in a vendor pass-thru.

We agreed that such feature would better fit cleaning, rather then
pre-deployment. Thus, it does not depend on deploy steps. It was suggested to
extend the management interface to support passing it an arbitrary JSON with
configuration. Then a clean step would pick it (similar to RAID).

Actions
     **rpioso** to write a spec for this feature.

Deploy steps
~~~~~~~~~~~~

We discussed `the deploy steps proposal <https://review.openstack.org/412523>`_
in depth. We agreed on partially splitting the deployment procedure into
pluggable bits. We will leave the very core of the deployment - flashing the
image onto a target disk - hardcoded, at least for now. The drivers will be
able to define steps to run before and after this core deployment. Pre- and
post-deployment steps will have different priorities ranges, something like::

     0 < pre-max/deploy-min < deploy-max/post-min < infinity

We plan on making partitioning a pre-deploy step, and installing a bootloader
a post-deploy step. We will not allow IPA hardware managers to define deploy
steps, at least for now.

Actions
     **yolanda** is planning to work on this feature, **rloo** and **TheJulia**
     to help.

Authenticating IPA
~~~~~~~~~~~~~~~~~~

IPA HTTP endpoints, and the endpoints Ironic provides for ramdisk callbacks
are completely insecure right now. We hesitated to add any authentication to
them, as any secrets published for the ramdisk to use (be it part of kernel
command line or image itself) are readily available to anyone on the network.

We agreed on several things to look into:

* A random CSRF-like token to use for each node. This will somewhat limit the
   attack surface by requiring an attacker to intercept a token for the
   specific node, rather then just access the endpoints.

* Document splitting out public and private Ironic API as part of our future
   reference architecture guide.

* Make sure we support TLS between Ironic and IPA, which is particularly
   helpful when virtual media is used (and secrets are not leaked).

Actions
     **jroll** and **joanna** to look into the random token idea.
     **jroll** to write an RFE for TLS between IPA and Ironic.

Smaller things
~~~~~~~~~~~~~~

Using ansible-networking as a ML2 driver for ironic-neutron integration work
     It was suggested to make it one of backends for
     ``networking-generic-switch`` in addition to ``netmiko``. Potential
     concurrency issues when using SSH were raised, and still require a solution.

Extending and standardizing the list of capabilities the drivers may discover
     It was proposed to use `os-traits <https://github.com/jaypipes/os-traits>`_
     for standardizing qualitative capabilities. **jroll** will look into
     quantitative capabilities.

Pluggable interface for long-running processes
     This was proposed as an optional way to mitigate certain problems with
     local long-running services, like console. E.g. if a conductor crashes,
     its console services keep running. It was noted that this is a bug to be
     fixed (**TheJulia** volunteered to triage it).
     The proposed solution involved optionally run processes on a remote
     cluster, e.g. k8s. Concerns were voiced on the PTG around complicating
     support matrix and adding more decisions to make for operators.
     There was no apparent consensus on implementing this feature due to that.

Setting specific boot device for PXE booting
     It was found to be already solved by setting ``pxe_enabled`` on ports.
     We just need to update ironic-inspector to set this flag.

Priorities and planning
-----------------------

The suggested priorities list is now finalized in
https://review.openstack.org/439710.

We also agreed on the following priorities for ironic-inspector subteam:

* Inspector HA (**milan**)
* Community goal - python 3.5 (**JayF**, **hurricanerix**)
* Community goal - devstack+apache+wsgi (**aarefiev**, **ovoshchana**)
* Inspector needs to update ``pxe_enabled`` flag on ports (**dtantsur**)



More information about the OpenStack-dev mailing list