Hi,

On Thu, Apr 30, 2020 at 12:40 AM Julia Kreger <juliaashleykreger@gmail.com> wrote:
Greetings fellow awesome sentient beings, and the occasional text indexers!

We have a growing conundrum with Redfish.

A long time ago, in a red shifted galaxy inhabited by Redfish, a
feature was defined early on in the DMTF specification defining a
"BootSourceOverrideEnabled" field as part of the ComputerSystem Boot
object[0]. This setting, in concert with the
"BootSourceOverrideTarget" and "BootSourceOverrideMode" setting
allowed Ironic to signal an override preference in regards to what
device the machine should boot to.

As time as moved forward, schema field descriptions were added, and
notes further detailed (observable by looking at [1], see
ComputerSystem 1.0.0, 1.5.0, 1.6.0, 1.8.0). One such case, at least
depending on what field you looked at also noted that
"BootSourceOverrideEnabled" was not possible with the "Continuous"
setting, however that was only noted under the one time override
option "UefiTargetBootSourceOverride" which is leveraged by the
explicit "UEFITarget" target option, at least until ComputerSystem
1.8.0. In ComputerSystem 1.8.0, an explicit note was added into the
description text for "BootSourceOverrideEnabled" detailing that one
time override use was the only option for UEFI.

The published documents also have not been very clear on this, but
there are numerous documents on the subject so it seems very easy for
the schema details note being added and enhanced in 1.8.0 drove
vendors to begin to change the logic around their
"BootSourceOverrideEnabled" option to represent what the current
documentation detailed, regardless of their schema version. Also, the
schema/api versions are disjointed and BMCs out there exist without
some of the defined fields. It just kind of is our life with hardware.

So why now? The ironic team has had reports recently, across multiple
vendors, where with newer firmware versions beginning to implement the
changes in schema 1.8.0, where it seems we will no longer be able to
use the option we were using, which seemed logical at least when
looking at the BMC conformance to the standard. And while some may
view the fact it worked on multiple vendors as a regression, I'm sure
they see it as a bug as it was seemingly never intended to work that
way when we look at the "UEFITargetBootSourceOverride" field notes
from the early schemas.

This nearly yielded a Linus-style response from me. Yes, it IS a breaking change, no matter why it was done and why the initial implementation was wrong from the purism perspective.

The first thing we should do to maintain honesty to our consumers is to mark the redfish hardware type and its derivatives (idrac-redfish, more?) as experimental and provide a clear warning in our documentation that the underlying standard and its implementations are unstable. And keep it like that until DMTF and the vendors finally make up their mind.
 

tl;dr: As BMC's vendors add the newer schema fields and resulting
behavior, the existing redfish driver's ability to assert persistent
boot devices breaks.

With all of that having been said, I think we should remove our
persistent setting capability in our redfish driver, and back port the
fix where possible. That being said, this is also a behavior change,
so it is time to get everyone's thoughts, since we will need to stress
that machines by default should boot from persistent storage instead
of network by default.

This is not just a behavior change, it's also a breaking change. I don't think the boot sequence you mention is the default, I'd rather expect machines (at least in legacy boot mode) to start with PXE booting.

However, I think I have a work around for that! We could build PXE configuration even for nodes with boot_option:local, so that if a machine boots from the network, we force it to boot from disk via iPXE configuration. Thoughts?
 

We also need to launch a long term effort to try and duplicate the
setting capabilities through the other BMC
interfaces/methods/settings, but the variation by vendor is what is
going to make it very difficult until the latest redfish schema
version is implemented by vendors. Of course, that too may be a while.

I would like to take a tougher stance on this and say that if vendors want us (or anyone in our situation, really) to support redfish, they need to work with us more closely on the coming redfish changes. Including getting us involved before changes are made to the schemas (to say nothing about actual hardware).

Dmitry
 

Thoughts?

-Julia

[0]: DMTF Redfish Schema Supplement 2019.1a, Page 71:
https://www.dmtf.org/sites/default/files/standards/documents/DSP0268_2019.1a.pdf
      or the 2016.2 version of the document, Page 16:
https://www.dmtf.org/sites/default/files/standards/documents/DSP2046_2016.2.pdf
      And the same information can be found in the and 2019.2, 2019.3
schema document versions.
[1]: Latest schema bundle pack:
https://www.dmtf.org/sites/default/files/standards/documents/DSP8010_2019.4.zip