Open Stack

Tue May 7 17:47:57 UTC 2019

Hi folks,

I've published my personal notes from the PTG & Forum in Denver: 
https://dtantsur.github.io/posts/ironic-denver-2019/
They're probably opinionated and definitely not complete, but I still think they 
could be useful.

Also pasting the whole raw RST text below for ease of commenting.

Cheers,
Dmitry

Keynotes
========

The `Metal3`_ project got some spotlight during the keynotes. A (successful!)
`live demo`_ was done that demonstrated using Ironic through Kubernetes API to
drive provisioning of bare metal nodes.

The official `bare metal program`_ was announced to promote managing bare metal
infrastructure via OpenStack.

Forum: standalone Ironic
========================

On Monday we had two sessions dedicated to the future development of standalone
Ironic (without Nova or without any other OpenStack services).

During the `standalone roadmap session`_ the audience identified two potential
domains where we could provide simple alternatives to depending on OpenStack
services:

* Alternative authentication. It was mentioned, however, that Keystone is a
   relatively easy service to install and operate, so adding this to Ironic
   may not be worth the effort.

* Multi-tenant networking without Neutron. We could use networking-ansible_
   directly, since they are planning on providing a Python API independent of
   their ML2 implementation.

Next, firmware update support was a recurring topic (also in hallway
conversations and also in non-standalone context). Related to that, a driver
feature matrix documentation was requested, so that such driver-specific
features are easier to discover.

Then we had a separate `API multi-tenancy session`_. Three topic were covered:

* Wiring in the existing ``owner`` field for access control.

   The idea is to allow operations for non-administrator users only to nodes
   with ``owner`` equal to their project (aka tenant) ID. In the non-keystone
   context this field would stay free-form. We did not agree whether we need an
   option to enable this feature.

   An interesting use case was mentioned: assign a non-admin user to Nova to
   allocate it only a part of the bare metal pool instead of all nodes.

   We did not reach a consensus on using a schema with the ``owner`` field,
   e.g. where ``keystone://{project ID}`` represents a Keystone project ID.

* Adding a new field (e.g. ``deployed_by``) to track a user that requested
   deploy for auditing purposes.

   We agreed that the ``owner`` field should not be used for this purpose, and
   overall it should never be changed automatically by Ironic.

* Adding some notion of *node leased to*, probably via a new field.

   This proposal was not well defined during the session, but we probably would
   allow some subset of API to lessees using the policy mechanism. It became
   apparent that implementing a separate *deployment API endpoint* is required
   to make such policy possible.

Creating the deployment API was identified as a potential immediate action
item. Wiring the ``owner`` field can also be done in the Train cycle, if we
find volunteers to push it forward.

PTG: scientific SIG
===================

The PTG started for me with the `Scientific SIG discussions`_ of desired
features and fixes in Ironic.

The hottest topic was reducing the deployment time by reducing the number of
reboots that are done during the provisioning process. `Ramdisk deploy`_
was identified as a very promising feature to solve this, as well as enable
booting from remote volumes not supported directly by Ironic and/or Cinder.
A few SIG members committed to testing it as soon as possible.

Two related ideas were proposed for later brainstorming:

* Keeping some proportion of nodes always on and with IPA booted. This is
   basing directly on the `fast-track deploy`_ work completed in the Stein
   cycle. A third party orchestrator would be needed for keeping the percentage,
   but Ironic will have to provide an API to boot an ``available`` node into the
   ramdisk.

* Allow using *kexec* to instantly switch into a freshly deployed operating
   system.

Combined together, these features can allow zero-reboot deployments.

PTG: Ironic
===========

Community sustainability
------------------------

We seem to have a disbalance in reviews, with very few people handling the
majority of reviews, and some of them are close to burning out.

* The first thing we discussed is simplifying the specs process. We considered a
   single +2 approval for specs and/or documentation. Approving documentation
   cannot break anyone, and follow-ups are easy, so it seems a good idea. We did
   not reach a firm agreement on a single +2 approval for specs; I personally
   feel that it would only move the bottleneck from specs to the code.

* Facilitating deprecated feature removals can help clean up the code, and it
   can often be done by new contributors. We would like to maintain a list of
   what can be removed when, so that we don't forget it.

* We would also like to switch to single +2 for stable backports. This needs
   changing the stable policy, and Tony volunteered to propose it.

We felt that we're adding cores at a good pace, Julia had been mentoring people
that wanted it. We would like people to volunteer, then we can mentor them into
core status.

However, we were not so sure we wanted to increase the stable core team. This
team is supposed to be a small number of people that know quite a few small
details of the stable policy (e.g. requirements changes). We thought we should
better switch to single +2 approval for the existing team.

Then we discussed moving away from WSME, which is barely maintained by a team
of not really interested individuals. The proposal was to follow the example of
Keystone and just move to Flask. We can use ironic-inspector as an example, and
probably migrate part by part. JSON schema could replace WSME objects,
similarly to how Nova does it. I volunteered to come up with a plan to switch,
and some folks from Intel expressed interest in participating.

Standalone roadmap
------------------

We started with a recap of items from `Forum: standalone Ironic`_.

While discussing creating a driver matrix, we realized that we could keep
driver capabilities in the source code (similar to existing iSCSI boot) and
generate the documentation from it. Then we could go as far as exposing this
information in the API.

During the multi-tenancy discussion, the idea of owner and lessee fields was
well received. Julia volunteered to write a specification for that. We
clarified the following access control policies implemented by default:

* A user can list or show nodes if they are an administrator, an owner of a
   node or a leaser of this node.
* A user can deploy or undeploy a node (through the future deployment API) if
   they are an administrator, an owner of this node or a lessee of this node.
* A user can update a node or any of its resources if they are an administrator
   or an owner of this node. A lessee of a node can **not** update it.

The discussion of recording the user that did a deployment turned into
discussing introducing a searchable log of changes to node power and provision
states. We did not reach a final consensus on it, and we probably need a
volunteer to push this effort forward.

Deploy steps continued
----------------------

This session was dedicated to making the deploy templates framework more usable
in practice.

* We need to implement support for in-band deploy steps (other than the
   built-in ``deploy.deploy`` step). We probably need to start IPA before
   proceeding with the steps, similarly to how it is done with cleaning.

* We agreed to proceed with splitting the built-in core step, making it a
   regular deploy step, as well as removing the compatibility shim for drivers
   that do not support deploy steps. We will probably separate writing an image
   to disk, writing a configdrive and creating a bootloader.

   The latter could be overridden to provide custom kernel parameters.

* To handle potential differences between deploy steps in different hardware
   types, we discussed the possibility of optionally including a hardware type
   or interface name in a clean step. Such steps will only be run for nodes with
   matching hardware type or interface.

Mark and Ruby volunteered to write a new spec on these topics.

Day 2 operational workflow
--------------------------

For deployments with external health monitoring, we need a way to represent
the state when a deployed node looks healthy from our side but is detected
as failed by the monitoring.

It seems that we could introduce a new state transition from ``active`` to
something like ``failed`` or ``quarantined``, where a node is still deployed,
but explicitly marked as at fault by an operator. On unprovisioning, this node
would not become ``available`` automatically. We also considered the
possibility of using a flag instead of a new state, although the operators in
the room were more in favor of using a state. We largely agreed that the
already overloaded ``maintenance`` flag should not be used for this.

On the Nova side we would probably use the ``error`` state to reflect nodes in
the new state.

A very similar request had been done for node retirement support. We decided to
look for a unified solution.

DHCP-less deploy
----------------

We discussed options to avoid relying on DHCP for deploying.

* An existing specification proposes attaching IP information to virtual media.
   The initial contributors had become inactive, so we decided to help this work
   to go through. Volunteers are welcome.

* As an alternative to that, we discussed using IPv6 SLAAC with multicast DNS
   (routed across WAN for Edge cases). A couple of folks on the room volunteered
   to help with testing. We need to fix python-zeroconf_ to support IPv6, which
   is something I'm planning on.

Nova room
---------

In a cross-project discussion with the Nova team we went through a few topics:

* Whether Nova should use new Ironic API to build config drives. Since Ironic
   is not the only driver building config drives, we agreed that it probably
   doesn't make much sense to change that.

* We did not come to a conclusion on deprecating capabilities. We agreed that
   Ironic has to provide alternatives for ``boot_option`` and ``boot_mode``
   capabilities first. These will probably become deploy steps or built-in
   traits.

* We agreed that we should switch Nova to using *openstacksdk* instead of
   *ironicclient* to access Ironic. This work had already been in progress.

Faster deploy
-------------

We followed up to `PTG: scientific SIG`_ with potential action items on
speeding up the deployment process by reducing the number of reboots. We
discussed an ability to keep all or some nodes powered on and heartbeating in
the ``available`` state:

* Add an option to keep the ramdisk running after cleaning.

   * For this to work with multi-tenant networking we'll need an IPA command to
     reset networking.

* Add a provisioning verb going from ``available`` to ``available`` booting the
   node into IPA.

* Make sure that pre-booted nodes are prioritized for scheduling. We will
   probably dynamically add a special trait. Then we'll have to update both
   Nova/Placement and the allocation API to support preferred (optional) traits.

We also agreed that we could provide an option to *kexec* instead of rebooting
as an advanced deploy step for operators that really know their hardware.
Multi-tenant networking can be tricky in this case, since there is no safe
point to switch from deployment to tenant network. We will probably take a best
effort approach: command IPA to shutdown all its functionality and schedule a
*kexec* after some time. After that, switch to tenant networks. This is not
entirely secure, but will probably fit the operators (HPC) who requests it.

Asynchronous clean steps
------------------------

We discussed enhancements for asynchronous clean and deploy steps. Currently
running a step asynchronously requires either polling in a loop (occupying
a green thread) or creating a new periodic task in a hardware type. We came up
with two proposed updates for clean steps:

* Allow a clean step to request re-running itself after certain amount of
   time. E.g. a clean step would do something like

   .. code-block:: python

     @clean_step(...)
     def wait_for_raid(self):
         if not raid_is_ready():
             return RerunAfter(60)

   and the conductor would schedule re-running the same step in 60 seconds.

* Allow a clean step to spawn more clean steps. E.g. a clean step would
   do something like

   .. code-block:: python

     @clean_step(...)
     def create_raid_configuration(self):
         start_create_raid()
         return RunNext([{'step': 'wait_for_raid'}])

   and the conductor would insert the provided step to ``node.clean_steps``
   after the current one and start running it.

   This would allow for several follow-up steps as well. A use case is a clean
   step for resetting iDRAC to a clean state that in turn consists of several
   other clean steps. The idea of sub-steps was deemed too complicated.

PTG: TripleO
============

We discussed our plans for removing Nova from the TripleO undercloud and
moving bare metal provisioning from under control of Heat. The plan from the
`nova-less-deploy specification`_, as well as the current state
of the implementation, were presented.

The current concerns are:

* upgrades from a Nova based deployment (probably just wipe the Nova
   database),
* losing user experience of ``nova list`` (largely compensated by
   ``metalsmith list``),
* tracking IP addresses for networks other than *ctlplane* (solved the same
   way as for deployed servers).

The next action item is to create a CI job based on the already merged code and
verify a few assumptions made above.

PTG: Ironic, Placement, Blazar
==============================

We reiterated over our plans to allow Ironic to optionally report nodes to
Placement. This will be turned off when Nova is present to avoid conflicts with
the Nova reporting. We will optionally use Placement as a backend for Ironic
allocation API (which is something that had been planned before).

Then we discussed potentially exposing detailed bare metal inventory to
Placement. To avoid partial allocations, Placement could introduce new API to
consume the whole resource provider. Ironic would use it when creating an
allocation. No specific commitments were made with regards to this idea.

Finally we came with the following workflow for bare metal reservations in
Blazar:

#. A user requests a bare metal reservation from Blazar.
#. Blazar fetches allocation candidates from Placement.
#. Blazar fetches a list of bare metal nodes from Ironic and filters out
    allocation candidates, whose resource provider UUID does not match one of
    the node UUIDs.
#. Blazar remembers the node UUID and returns the reservation UUID to the user.

When the reservation time comes:

#. Blazar creates an allocation in Ironic (not Placement) with the candidate
    node matching previously picked node and allocation UUID matching the
    reservation UUID.
#. When the enhancements in `Standalone roadmap`_ are implemented, Blazar will
    also set the node's lessee field to the user ID of the reservation, so that
    Ironic allows access to this node.
#. A user fetches an Ironic allocation corresponding to the Blazar reservation
    UUID and learns the node UUID from it.
#. A user proceeds with deploying the node.

Side and hallway discussions
============================

* We discussed having Heat resources for Ironic. We recommended the team to
   start with Allocation and Deployment resources (the latter being virtual
   until we implement the planned deployment API).

* We prototyped how Heat resources for Ironic could look, including Node, Port,
   Allocation and Deployment as a first step.

.. _Metal3: http://metal3.io
.. _live demo: 
https://www.openstack.org/videos/summits/denver-2019/openstack-ironic-and-bare-metal-infrastructure-all-abstractions-start-somewhere
.. _bare metal program: https://www.openstack.org/bare-metal/
.. _standalone roadmap session: 
https://etherpad.openstack.org/p/DEN-train-next-steps-for-standalone-ironic
.. _networking-ansible: https://opendev.org/x/networking-ansible
.. _API multi-tenancy session: 
https://etherpad.openstack.org/p/DEN-train-ironic-multi-tenancy
.. _Scientific SIG discussions: 
https://etherpad.openstack.org/p/scientific-sig-ptg-train
.. _Ramdisk deploy: 
https://docs.openstack.org/ironic/latest/admin/interfaces/deploy.html#ramdisk-deploy
.. _fast-track deploy: https://storyboard.openstack.org/#!/story/2004965
.. _python-zeroconf: https://github.com/jstasiak/python-zeroconf
.. _nova-less-deploy specification: 
http://specs.openstack.org/openstack/tripleo-specs/specs/stein/nova-less-deploy.html

Open Stack

[ironic] My PTG & Forum notes

OpenStack

Community

Documentation

Branding & Legal