So I had an interesting discussion with an operator last week which relates quite a bit. The tl;dr is they want to hide all the vendor complexities for basic operational usage patterns, meaning deployment of machines. The thing Ironic was originally intended to try to achieve and mostly does with the community drivers. Ultimately the thing redfish was intended to try and aid as well. There was recognition that advanced actions, tasks, require specific drivers and highly specific settings, but they want to make that the exception and less of the current operational rule. They expressed a need to pivot on demand to meet their business needs, and that firmware changes are a factor in their decision making, planning processes, and a frequent headache. Hearing this, I think we have three distinct problems to solve. 1) We should enable operators of our API to have an easy path for basic usage. If I had to condense this into some sort of user story, it would be "As a infrastructure operator, I only want to provide a username, password, and base address to at least get started to perform a deployment. I don't want to have to identify vendor specifics out of the gate." 2) We should enable operators to pivot to a vendor driver easily, which is in hand with #1. In a sense, we already have part of this, although different username/password settings don't help this situation. I would boil this down to a user story of: "As an infrastructure operator, I need to be able to pivot hardware due to supply chain issues or due to immovable operational/security requirements." To put this into more precise language: "The business requirements and plan can not be put on hold." 3) Operators need less BMC interaction breakages in terms of features. They can no longer rely on the common denominator (PXE) if it doesn't meet their operational/security/networking models, especially the further outside of their traditional data center walls as they can no longer have any sort of implicit trust of the hardware's networking and physical state. "As an infrastructure operator, I need some assurance of capability stability in the BMCs and that the established API + hardware behavior contracts are not broken." I think this comes down to a mix of things like redfish profiles/testing *and* third party CI which also tests the latest firmware or even pre-release firmware. Perhaps not a public CI system, but something that can be stood up to provide feedback which also draws on upstream state so firmware changes can be detected and prevented them from breaking or starting to derail a tight operational timeline. I guess this is part of the nature of automating infrastructure. We've gone from processes that took months and months to facilitate the roll-out of an entire data center to more and more rapid solutions. I think we're at the point where expectations and planning all center around the union of stability, capability, and time-to-finished system deployment. Technical organizations make commitments to the business. Businesses make further commitments from there. It is just the nature of the industry. And with that, I guess I'd like to resume discussion on the larger topic of how we enable operators to always just start with "redfish_address", "redfish_username", "redfish_password", and ultimately reach a virtual media driven deployment regardless of the vendor. -Julia On Thu, Jan 28, 2021 at 8:07 AM Dmitry Tantsur <dtantsur@redhat.com> wrote:
Hi,
This is how the patch would look like: https://review.opendev.org/c/openstack/ironic/+/772899
It even passes redfish and idrac unit tests unmodified (modulo incorrect mocking in test_boot).
Dmitry
On Mon, Jan 25, 2021 at 3:52 PM Dmitry Tantsur <dtantsur@redhat.com> wrote:
Hi,
On Mon, Jan 25, 2021 at 7:04 AM Pioso, Richard <Richard.Pioso@dell.com> wrote:
On Wed, Jan 20, 2021 at 9:56 AM Dmitry Tantsur <dtantsur@redhat.com> wrote:
Hi all,
Now that we've gained some experience with using Redfish virtual media I'd like to reopen the discussion about $subj. For the context, the idrac-redfish-virtual-media boot interface appeared because Dell machines need an additional action [1] to boot from virtual media. The initial position on hardware interfaces was that anything requiring OEM actions must go into a vendor hardware interface. I would like to propose relaxing this (likely unwritten) rule.
You see, this distinction causes a lot of confusion. Ironic supports Redfish, ironic supports iDRAC, iDRAC supports Redfish, ironic supports virtual media, Redfish supports virtual media, iDRAC supports virtual media. BUT! You cannot use redfish-virtual-media with iDRAC. Just today I had to explain the cause of it to a few people. It required diving into how exactly Redfish works and how exactly ironic uses it, which is something we want to protect our users from.
Wow! Now I’m confused, too. AFAIU, the people you had to help decided to use the redfish driver, instead of the idrac driver. It is puzzling that they decided to do that considering the ironic driver composition reform [1] of a couple years ago. Recall that reform allows “having one vendor driver with options configurable per node instead of many drivers for every vendor” and had the following goals.
When discussing the user's confusion we should not operate in terms of ironic (especially since the problem happened in metal3 land, which abstracts away ironic). As a user, when I see Redfish and virtual media, and I know that Dell supports them, I can expect redfish-virtual-media to work. The fact that it does not may cause serious perception problems. The one I'm particularly afraid of is end users thinking "iDRAC is not Redfish compliant".
“- Make vendors in charge of defining a set of supported interface implementations in priority order. - Allow vendors to guarantee that unsupported interface implementations will not be used with hardware types they define. This is done by having a hardware type list all interfaces it supports.”
The idrac driver is Dell Technologies’ vendor driver for systems with an iDRAC. It offers a one-stop shop for using ironic to manage its systems. Users can select among the hardware interfaces it supports. Each interface uses a single management protocol -- Redfish, WS-Man, and soon IPMI [2] -- to communicate with the BMC. While it supports the idrac-redfish-virtual-media boot interface, it does not support redfish-virtual-media. One cannot configure a node with the idrac driver to use redfish-virtual-media.
I know, the problem is explaining to users why they can use the redfish hardware type with Dell machines, but only partly.
We already have a precedent [2] of adding vendor-specific handling to a generic driver.
That change was introduced about a month ago in the community’s vendor-independent ipmi driver. That was very understandable, since IPMI is a very mature management protocol and was introduced over 22 years ago. I cannot remember what I was doing back then :) As one would expect, the ipmi driver has experienced very little change over the past two-plus years. I count roughly two (2) substantive changes over that period. By contrast, the Redfish protocol is just over five (5) years old. Its vendor-independent driver, redfish, has been fertile ground for adding new, advanced features, such as BIOS settings configuration, firmware update, and RAID configuration, and fixing bugs. It fosters lots of change, too many for me to count.
I have proposed a patch [3] to block using redfish- virtual-media for Dell hardware, but I grew to dislike this approach. It does not have precedents in the ironic code base and it won't scale well if we have to handle vendor differences for vendors that don't have ironic drivers.
Dell understands and is on board with ironic’s desire that vendors support the full functionality offered by the vendor-independent redfish driver. If the iDRAC is broken with regards to redfish-virtual-media, then we have a vested interest in fixing it. While that is worked, an alternative approach could be for our community to strengthen its promotion of the goals of the driver composition reform. That would leverage ironic’s long-standing ability to ensure people only use hardware interfaces which the vendor and its driver support.
Yep. I don't necessarily disagree with that, but it poses issues for layered products like metal3, where on each abstraction level a small nuance is lost, and the end result is confusion and frustration.
Based on all this I suggest relaxing the rule to the following: if a feature supported by a generic hardware interface requires additional actions or has a minor deviation from the standard, allow handling it in the generic hardware interface. Meaning, redfish-virtual-media starts handling the Dell case by checking the System manufacturer (via the recently added detect_vendor call) and loading the OEM code if it matches "Dell". After this idrac-redfish-virtual-media will stay empty (for future enhancements and to make the patch backportable).
That would cause the vendor-independent redfish driver to become dependent on sushy-oem-idrac, which is not under ironic governance.
This itself is not a problem, most of the projects we depend on are not under ironic governance.
Also it won't be a hard dependency, only if we detect 'Dell' in system.manufacturer.
It is worth pointing out the sushy-oem-idrac library is necessary to get virtual media to work with Dell systems. It was first created for that purpose. It is not a workaround like those in sushy, which accommodate common, minor standards interpretation and implementation differences across vendors by sprinkling a bit of code here and there within the library, unbeknownst to ironic proper.
We at Dell Technologies are concerned that the proposed rule change would result in a greater code review load on the ironic community. Since vendor-specific code would be in the generic hardware interface, much more care, eyes, and integration testing against physical hardware would be needed to ensure it does not break others. And our community is already concerned about its limited available review bandwidth [3]. Generally speaking, the vendor third-party CIs do not cover all drivers. Rather, each vendor only tests its own driver, and, in some cases, sushy. Therefore, changes to the vendor-independent redfish driver may introduce regressions in what has been working with various hardware and not be detected by automated testing before being merged.
The change will, in fact, be tested by your 3rd party CI because it was used by both the generic redfish hardware type and the idrac one.
I guess a source of confusion may be this: I don't suggest the idrac hardware type goes away, nor do I suggest we start copying its Dell-specific features to redfish.
Can we afford this additional review load, prospective slowing down of innovation with Redfish, and likely undetected regressions? Would that be best for our users when we could fix the problem in other ways, such as the one suggested above?
Also consider that feedback to the DMTF to drive vendor consistency is critical, but the DMTF needs feedback on what is broken in order to push others to address a problem. Remember the one-time boot debacle when three vendors broke at the same time? Once folks went screaming to the DMTF about the issue, it quickly explained it to member companies, clarified the standard, and created a test case for that condition. Changing the driver model to accommodate everyone's variations will reduce that communication back to the DMTF, meaning the standard stalls and interoperability does not gain traction.
I would welcome somebody raising to DMTF the issue that causes iDRAC to need another action to boot from virtual media, I suspect other vendors may have similar issues. That being said, our users are way too far away from DMTF, and even we (Julia and myself, for example) don't have a direct way of influencing it, only through you and other folks who help (thank you!).
Thoughts?
TL;DR, we strongly recommend ironic not make this rule change. Clearly communicating users should use the vendor driver should simplify their experience and eliminate the confusion.
The code as-is is factored well as a result of the 21st century approach the community has taken to date. Vendors can implement the driver OEM changes they need to accommodate their unique hardware and BMC requirements, with reduced concern about the risk of breaking other drivers or ironic itself. Ironic’s driver composition reform, sushy, and sushy’s OEM extension mechanism support that modern approach. Our goal is to continue to improve the iDRAC Redfish service’s compliance with the standard and eliminate the kind of OEM code Dmitry identified.
Beware of unintended consequences, including
- reduced quality, - slowed feature and bug fix velocity,
I don't see how this happens, given that the code is merely copied from one place to the other (with the 1st place inheriting it from its base class).
- stalled DMTF Redfish standard, - lost Redfish interoperability traction, and
I'm afraid we're actually hurting Redfish adoption when we start complicating its usage. Think, with IPMI everything "Just Works" (except when it does not, but that happens much later), while for Redfish the users need to be aware of... flavors of Redfish? Something that we (and DMTF) don't even have a name for.
Dmitry
- increased code review load.
Dmitry
[1] https://opendev.org/openstack/ironic/src/commit/6ea73bdfbb53486cf9 905d21024d16cbf5829b2c/ironic/drivers/modules/drac/boot.py#L130 [2] https://review.opendev.org/c/openstack/ironic/+/757198/ [3] https://review.opendev.org/c/openstack/ironic/+/771619
-- Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn, Commercial register: Amtsgericht Muenchen, HRB 153243, Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neill
I welcome your feedback.
Rick
[1] https://opendev.org/openstack/ironic-specs/src/branch/master/specs/approved/... [2] https://storyboard.openstack.org/#!/story/2008528 [3] https://etherpad.opendev.org/p/ironic-wallaby-midcycle
-- Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn, Commercial register: Amtsgericht Muenchen, HRB 153243, Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neill
-- Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn, Commercial register: Amtsgericht Muenchen, HRB 153243, Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neill