[openstack-dev] [ironic] Summary of ironic sessions from Sydney

Julia Kreger juliaashleykreger at gmail.com
Tue Nov 14 16:18:02 UTC 2017


Greetings ironic folk!

Like many other teams, we had very few ironic contributors make it to
Sydney. As such, I wanted to go ahead and write up a summary that
covers takeaways, questions, and obvious action items for the
community that were raised by operators and users present during the
sessions, so that we can use this as feedback to help guide our next
steps and feature planning.

Much of this is from my memory combined with notes on the various
etherpads. I would like to explicitly thank NobodyCam for reading
through this in advance to see if I was missing anything at a high
level since he was present in the vast majority of these sessions, and
dtantsur for sanity checking the content and asking for some
elaboration in some cases.

-Julia



Ironic Project Update
=====================

Questions largely arose around use of boot from volume, including some
scenarios we anticipated that would arise, as well as new scenarios
that we had not considered.

Boot nodes booting from the same volume
---------------------------------------

>From a technical standpoint, when BFV is used with iPXE chain loading,
the chain loader reads the boot loader and related data from the
cinder (or, realistically, any iSCSI volume). This means that a
skilled operator is able to craft a specific volume that may just turn
around and unpack a ramdisk and operate the machine solely from RAM,
or that utilize an NFS root.

This sort of technical configuration would not be something an average
user would make use of, but there are actual use cases that some large
scale deployment operators would make use of and that would provide
them value.

Additionally, this topic and the desire for this capability also come
up during the “Building a bare metal cloud is hard” talk Q&A.

Action Item: Check the data model to see if we prohibit, and consider
removing the prohibition against using the same volume across nodes,
if any.

Cinder-less BFV support
-----------------------

Some operators are curious about booting Ironic managed nodes without
cinder in a BFV context. This is something we anticipated and built
the API and CLI interfaces to support this. Realistically, we just
need to offer the ability for the data to be read and utilized.

Action Item: Review code and ensure that we have a some sort of no-op
driver or method that allows cinder-less node booting. For existing
drivers, it would be the shipment of the information to the BMC or the
write-out of iPXE templates as necessary.

Boot IPA from a cinder volume
-----------------------------

With larger IPA images, specifically in cases where the image contains
a substantial amount of utilized or tooling to perform cleaning,
providing a mechanism to point the deployment Ramdisk to a cinder
volume would allow more efficient IO access.

Action Item: Discuss further - Specifically how we could support as we
would need to better understand how some of the operators might use
such functionality.

Dedicated Storage Fabric support
--------------------------------

A question of dedicated storage fabric/networking support arose. For
users of FibreChannel, they generally have a dedicated storage fabric
by the very nature of separate infrasturcture. However, with ethernet
networking where iSCSI software initiators are used, or even possibly
converged network adapters, things get a little more complex.

Presently, with the iPXE boot from volume support, we boot using the
same interface details for the neutron VIF that the node is attached
with.

Moving forward, with BFV, the concept was to support the use of
explicitly defined interfaces as storage interfaces, which could be
denoted as "volume connectors" in ironic by type defined as "mac". In
theory, we begin to get functionality along these lines once
https://review.openstack.org/#/c/468353/ lands, as the user could
define two networks, and the storage network should then fall to the
explicit volume connector interface(s). The operator would just need
to ensure that the settings being used on that storage network are
such that the node can boot and reach the iSCSI endpoint, and that a
default route is not provided.

The question then may be, does Ironic do this quietly for the user
requesting the VM or not, and how do we document the use such that
operators can conceptualize it. How do we make this work at a larger
scale? How could this fit or not fit into multi-site deployments?

In order to determine if there is more to do, we need to have more
discussions with operators.

Action items:

* Determine overall needs for operators, since this is implementation
architecture centric.
* Plan forward path form there, if it makes sense.

Note: This may require more information to be stored or leveraged in
terms of structural or location based data.

Migration questions from classic drivers to Hardware types
----------------------------------------------------------

One explicit question from the operator community was if we intended
to perform a migration from the classic driver to hardware types. In a
sense, there are two issues here. The first being a perception of the
work and the second is there a good way to cleanly identify, and
transform classic drivers during upgrade.

Action item:

* For whatever reason the ironic community felt it was un-necessary to
facilitate a migration for users from drivers to resource classes,
even though we have direct analogs. The ironic community should
re-evaluate and consider implementing migration logic to ease user
migration.
* In order to proceed, Ironic does need to understand if operators
would be okay if the upgrade process failed, that is if the
pre-upgrade checks detected that the configuration was incompatible
for a migration to be successful. This could allow an operator to
correct their configuration file and re-execute the upgrade attempt.


Ironic use Feedback Session
===========================

https://etherpad.openstack.org/p/SYD-forum-ironic-feedback

The feedback session felt particularly productive because developers
were far out numbered by operators.

Current Troubles/Not Working for Operators
------------------------------------------

* Current RAID deployment process where we apply raid configuration
generally during the cleaning step, prior to deploy.
** One of the proposed solutions was the marriage of traits, deploy
templates, and the application of deployment templates upon
deployment.
** The concern is that this will lead to an explosion of flavors, and
some operators environments are already extremely flavor-full. “I
presently run `nova flavor-list`, and go get a coffee”
** The mitigating factor will be the ability to allow at-boot time
definition by the user initiating the deployment, that additional
traits could be proposed on the command line. This was mentioned by
Sam Betts, and one of the nova cores present indicated that it was
part of their plan.

* UEFI iPXE boot - Specifically some operators are encountering
issues, with some vendors hardware, that “should” be compatible,
however is not actually working except in specific scenarios.
** This is not an ironic bug.
** In the specific case that an operator reported, they were forced
into use of a vendor driver and specific settings, which seemed like
something they would have preferred to avoid.
** The community members, as long with the users and operators present
agreed that a good solution would be to propose documentation updates
to our repository that detail when drivers _do not_ work, or when
there are weird compatibility issues that are not quite visible.
** It may be worth considering some sort of matrix to raise visibility
of drivers compatibility/interoperability moving forward. The Ironic
team would not push back if an operator wishes to being updating our
Admin documentation with such information.


Action Items:
* The community should encourage operators to submit documentation
changes when they become aware of such issues.
* The community should also encourage vendor driver maintainers to
explicitly document their known-good/tested scenarios, as some
hardware with-in the same family can vary.


What Operators are indicating that they need
--------------------------------------------

Firmware Updates
~~~~~~~~~~~~~~~~

Our present firmware update model is dependent upon a hardware manager
driving the process during cleaning, which presently requires the
hardware manager to be built inside the ramdisk image. This is
problematic as it requires operators to craft and build hardware
managers that fit their needs, and then ensure those are running on
the specific hosts to upgrade their firmware.

While this may seem easy and reasonable for a small deployment, there
is an operations disconnect in many organizations between who blesses
new firmware versions, and who controls the hardware. In some cases, a
team may be in charge of certifying and testing new firmware, while
another team entirely operates the cloud. These process and
operational constraints also prevent hardware managers from being
shared in the open, because they could potentially reveal security
state of a deployment. Simply put, operators need something easier,
especially when they may receive twenty different chassis in a single
year.

While we discussed this as a group, we did seem to begin to reach an
understanding of what would be useful.

Several operators made it clear that they feel that Ironic is in a
position to help drive standardization across vendors.

What operators are looking for:

* A framework or scaffolding to facilitate centrally managed firmware
updates where the current state information is published upward, and
the system replies with the firmware to be applied.
** Depending on the deployment, an operator may choose to assert
firmware upon every cleaning, but they need to be able to identify,
the hardware, current firmware, and necessary versions by some sort of
policy.
** Any version policy may vary across the infrastructure, based on
either resource class, or hardware ownership concepts.
** This may, in itself just be a hardware manager that calls out to an
external service, and executes based upon what the service tells it to
do.

* Ironic to work with vendors to encourage some sort of standardized
packaging and/or installation process such that the firmware updating
code can be as generic as possible.

One other note worth mentioning, some operators spoke of stress
testing their hardware during cleaning processes. It seems like a
logical thing to do, however this would be best something for a few
operators to explicitly propose what they wish to test, and how they
do it presently so we as a community can gain a better understanding.

Action Items:

* Poll hardware vendors during the next weekly meeting and attempt to
build an understanding and consensus.
* With feedback, we will then have to take the next step is trying to
determine how to fit such a service into Ironic along with what
ironic's expectations are for drivers regarding firmware upgrades.

TPM Key Assertion
~~~~~~~~~~~~~~~~~

Some operators utilize their TPMs for secure key storage, and need a
mechanism to update/assert a new key to overwrite existing key data.
The key data in the TPM is used by the running system, and we have no
operational need to store or utilize the data. Presently some
operators perform this manually, and replacing keys on systems running
elsewhere in the world is presently a human intensive process.

The consensus in the room was that this might be a good out of band
management interface feature which could be in the management
interfaces for the vendor drivers. We presently minimally use the
management interface.

>From a security standpoint, this is also something we shouldn’t store
locally, but only be a clean pass-through conduit for the data, which
makes explicitly out-of-band usage even more appealing with vendor
drivers.

Action Item: Poll hardware vendors during the next weekly meeting or
two in order to begin discussion of viability/capability to support.
This could be passthru functionality in the driver, but if two drivers
can eventually support such functionality, we should standardize this
upfront.

Reversing the communications flow
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

One of the often discussed items in security conscious environments is
the need to be able to have the conductor initiate communication to
the agent. While this model will not work for all deployments, we've
had a consensus in the past that this is something we should consider
doing as an optional setting.

In other words, IPA could have a mode of operation where it no longer
heartbeats to the API, where the conductor would lookup the address
from Neutron, and proceed to poll it until the node came online. The
conductor would then poll that address on a regular basis, much like
heart-beating works today. We should keep in mind, that this polling
operation will have an increased impact on conductor load.

Several operators present in the session expressed interest, with
others indicated this would be a breaking change for their
environment's security model, and as such any movement in this
direction must be optional.

Action Item: Someone writes a specification and poll the larger
operators that we know in the community for thoughts, in order to see
if it meets their needs.

Documentation on known issues and fixes or incompatibilities
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Operators would like to see more information on known driver issues or
incompatibilities explicitly stated in the documentation. Additionally
operators would like a single location for "how to fix this issue"
that is not a bug tracking system.

There seemed to be consensus that these were good things to document,
and like the like this, and it seems like the community does not
disagree. That being said, the operators are the ones who will have a
tendency for us to be more aware of such issues.

The best way for the operator community to help the developers, is to
propose documentation patches to the ironic repository to raise
awareness and visibility with-in our own documentation. We must keep
in mind that we must curate this information as well, since some of
these things are not necessarily “bugs”, much like the UEFI boot
issues noted earlier.

Automatic BIOS Configuration
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

tl;dr: We are working on it.

Use of System UUID in addition to MAC addresses
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Some operators would like to see changes in order to allow booting
based upon a recorded system UUID. In these cases, the operator may be
using the "noop" network interface driver, or another custom driver
that does not involve neutron.

To the reasons to support UUID based booting are extensive:

* iPXE can attempt the system UUID first, so support for utilizing the
UUID could remove several possible transactions.

* The first ethernet device to iPXE may not be the one known to
ironic, and may still obtain an IP address based upon the environment
configuration/operation. This will largely be the case in an
environment where DHCP is not managed by neutron. Presently, operators
have to wait for the unknown interfaces to fail completely, and
eventually reach a known network interface.

** Possibly worse, the order may not be consistent depending on the
hardware boot order / switch configuration / cabling.
** Operators indicated that swapped cabling with Link Aggregates is a
semi-common problem they encounter.

* MAC addresses of nodes may just not be known, and evolving to
support hard coded UUIDs does provide us some greater flexibility in
the terms of being able to boot a node where we do not know nor
control the IP addressing assigned.


In addition to UUIDs, an operator expressed interest in having the
same boot behavior, however with the IP address allocated, as opposed
to the UUID. This also deserves consideration as it may be useful for
some operators.

Action Items:

* We should determine a method of storage of the UUID if discovered or
already known. Some operators may already know the address.

** Suggestion from an operator: Maybe just allow setting of uuid when
a node is created, like we do with ports, so that a operator or
inspector could set the node uuid to be the same as the systems uuid,
thus eliminating the need for another field.
** Ironic contributor: Alternatively, we should just add a boolean
that writes it out and offers it as an initial step, and then falls
back to the MAC address based attempt.

* Update template generation to support writing a symlink with the
UUID and / or MAC addresses.

* Explore possibility of doing the same with IP addresses.

Diskless boot
~~~~~~~~~~~~~

This is a repeat theme that has arisen before, and in many cases could
be solved via the BFV iPXE functionality, however other operators have
expressed need in the past for more generic boot options in order to
boot their nodes. There has been some prior specifications on making
generic PXE interfaces available for things such as network switches.
As such, we should likely re-evaluate those specifications.

Action Item: Ironic should re-evaluate current position, and review
related specifications.

Physical Location/Hardware Ownership
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

This one didn't quite make the notes, but a couple attendees seem to
remember it and it is worth mentioning.

Presently, there is no ability to have a geographically diverse ironic
installation where a single pane of glass is provided by the API. To
add further complexity, ironic may be in a situation where it is
managing hardware that the operator might not explicitly own, that
needs to be in a delineated pool. We presently have no way to
represent or control scheduling in these cases.

This quickly leads to tenant awareness, as in an operator may have a
tenant that owns hardware hardware in the datacenter. Naturally, this
can get complex quite quickly, but it seems logical for many users as
they have either trusted users that may wish to manually deploy
hardware, and in many of those cases, it is desired that hardware be
used by no other tenant. This concept may also be extended to a
concept of "authorized users" who have temporary implied rights to
interact with ironic based upon permissions and current ownership.

To keep this short, the impacts of this are _massive_ as they are
intertwined fundamental changes to how ironic represents data at the
API level as well as to how ironic executes requests as the end goal
would be to provide the ability to provide the granularity to say
"These two conductors are for x environment" or the granularity of
"your only allowed to see x nodes". As a result of all of this, it
would be a huge API change. The current concept of which, is just
build upon the existing 1.0 API as a 2.0 API.

Action Item: TheJulia and Nobodycam volunteer as tributes... to start a spec.


Ironic On-boarding Session
==========================

The Sydney summit was the first attempt by the ironic team to execute
an on-boarding session for the community. As such, the intent was to
take a free form approach as an attempt to answer questions that
anyone had for community members, which would provide feedback into
what new contributors might be interested in moving forward.

By in large, the questions that were asked boiled down to the
following questions:
Where do I find the code?
This was largely a question of what repository contains what pieces.

How do I setup a test environment?
This was very much a question of getting started, which led into the
next logical question.

How do I test without real physical servers?
The answer became Devstack or Bifrost, depending on your use case and
desire to perform full-stack development or lightweight work with or
along side Ironic.

Can I test using remote VMs?
Overall the answer was yes, but that you needed to handle networking
yourself to bridge it through and have some mechanism to control
power. Ironic-staging-drivers was brought up as a repository that
might have useful drivers in these cases. Ironic should to look at
improving some of our docs to highlight the possibilities?

What alternatives to devstack are there?
Bifrost was raised as an example. We failed to mention kolla as an option. :(

How do we see community priorities?
This was very easy for us, but for a new contributor coming into the
community, it is not as clear. Ironic should consider improving
documentation to make it very clear where to find this information.


Action Items:
* Some of Ironic's documentation for new contributors may need
revision to provide some of these contextual details upfront, or we
might need to consider a Q&A portion of the documentation.
* The ironic community should ensure that the above questions are
largely answered in whatever material is presented as part of the next
on-boarding session at the Vancouver summit.


Mogan/Nova/Ironic Session
=========================

https://etherpad.openstack.org/p/SYD-forum-baremetal-ironic-mogan-nova

The purpose of this session was to help compare, contrast, and provide
background to the community as to the purpose behind the Mogan
project, which was to create a baremetal centric user-facing compute
API allowing non-administrator users to provision and directly control
baremetal. The baremetal in mogan's context could be baremetal that
already exists, or baremetal that is created from some sort of
composible infrastructure.

The Mogan PTL started with an overview to provide context to the
community, and then the community shifted to asking questions to gain
context, including polling operators for interest and concerns.
Primarily, operator concern was over creating divergence and user
confusion in the community.

Once we had some operator input, we attempted to identify differences
and shortcomings in Ironic and Nova that primarily drove the effort.
What we were able to identify from a work-in-progress comparison
document largely indicated that was additional insight into aggregates
which was partly due to affinity/anti-affinity needs. Additional
functionality exists in Mogan to list servers available for management
and then directly act upon them, although the extent of what
additional actions can be taken upon a baremetal node had not been
identified.

As the discussions went on, the Ironic team members that were present
were able to express our concerns over communication. It largely
seemed to be a surprise that some of our hardware teams were working
in the direction of composible hardware, and that the use model mogan
sought could fit into our scope and workflow for composible hardware.
Largely, for composible hardware, we would need some way to represent
a node that a user wishes to perform an action upon. In some cases
now, that is performed with placeholder records representing possible
capacity.

Naturally for ironic, making it user facing would be a very large
change to Ironic's API, however these are changes, based on other
sessions, that Ironic may wish to explore given stated operator needs.

The discussion for both Ironic and Nova was more of a “How do we best
navigate” instead of “If we should navigate” question, which in it's
self is positive.

Some of these items included improving the view of available physical
baremetal. Regional/Availability zoning, tenant utilization of the
API, and possibly hardware ownership concepts. Many of these items, as
touched in the feedback session, are intertwined.

Overall, the session was good in that we were able to gain consensus
that the core issues which spurred the creation of Mogan are
addressable by the present Ironic and Nova contributors. Complete
gap/feature comparison remains as an outstanding item, which may still
influence the discussion going forward.


Baremetal Scheduling session
============================

https://etherpad.openstack.org/p/SYD-forum-baremetal-scheduling

We were originally hoping to cancel this session and redirect everyone
into the nova placement status update, but we soon found out that
there were some lingering questions as well as concerns of operators
that needed to be discussed.

We started out in discussion and came to the realization that there
could very well be a trait explosion in the attempt to support
affinity and anti-affinity efforts. While for baremetal it could be a
side-effect, it does not line up with the nova model. Conceptually, we
could quickly end up with trait lists that could look something like:

    CUSTOM_AC_GRID_C
    CUSTOM_ROOM1_POWER_GRID_C
    CUSTOM_CABINET_4
    CUSTOM_Charlotte_DC3
    CUSTOM_Charlotte_DC3_ROW2_CAB4
    CUSTOM_CUSTOMER_TAG
    CUSTOM_OWNED_ENV
    NET1GB
    NET2GB
    NET10GB
    NET10GB_DUAL
    CUSTOM_STORAGE_FABRIC_A
    CUSTOM_FC_FABRIC_B
    CUSTOM_REDUNDANT_COOLING
    CUSTOM_IS_A_BIKE_SHED_ON_THE_MOON
    CUSTOM_IS_NOT_LORD_VADERS_BIKESHED

At some point, someone remarked “It seems like there is just no
solution that is going to work for everyone by default.” The remark
was not just resource class determination, but trait identification,
but also encompassed scheduling affinity and anti-affinity, which
repeatedly came up in discussions over the week.

This quickly raised an operator desire for the ironic community to
solve for what would fit 80% of use cases, and then iterate moving
forward. The example brought up in discussion was to give operators an
explicit configuration parameter that they could use to assert
resource_class, or possibly even static trait lists until they can
populate the node with what should be there for their deployment, or
for that individual hardware installation in their environment.

While, the ironic community solution is "write introspection rules",
it seems operators just want something simpler that is a standing
default, like an explicit default in the configuration file.

Some operators pointed out that with their processes, they would
largely know or be able to reconcile the differences in their
environment and make those in ironic as-needed.

Eventually, the discussion shifted to affinity/anti-affinity which
could partially make use of tags, although that as previously
detailed, would quickly result in a tag explosion depending on how an
operator implements and chooses to manage their environment.

Action items:
* Ironic needs to discuss as a group and what the impact of this
discussion means. Many of themes beyond providing configurable
defaults to meet the 80% of users, have repeatedly come up and really
drive towards some of the things detailed as part of the feedback
session.
* For "resource_class" defaults, Dmitry was kind enough to create
https://bugs.launchpad.net/ironic/+bug/1732190 as this does seem like
a quick and easy thing for us to address.



More information about the OpenStack-dev mailing list