[ironic][ptg] Summary of discussions/happenings related to ironic

Julia Kreger juliaashleykreger at gmail.com
Wed Nov 13 20:40:41 UTC 2019


A minor revision, we have new links for the videos as it seems there
was an access permission issue.

Links replaced below.

On Wed, Nov 13, 2019 at 11:35 AM Julia Kreger
<juliaashleykreger at gmail.com> wrote:
>
> Overall, There was quite a bit of interest in Ironic. We had great
> attendance for the Project Update, Rico Lin’s Heat/Ironic integration
> presentation, demonstration of dhcp-less virtual media boot, and the
> forum discussion on snapshot support for bare metal machines, and
> more! We also learned there are some very large bare metal clouds in
> China, even larger than the clouds we typically talk about when we
> discuss scale issues. As such, I think it would behoove the ironic
> community and OpenStack in general to be mindful of hyper-scale. These
> are not clouds with 100s of compute nodes, but with baremetal clouds
> containing thousands to tens of thousands of physical bare metal
> machines.
>
> So in no particular order, below is an overview of the sessions,
> discussions, and commentary with additional status where applicable.
>
> My apologies now since this is over 4,000 words in length.
>
> Project Update
> ===========
>
> The project update was fairly quick. I’ll try and record a video of it
> sometime this week or next and post it online. Essentially Ironic’s
> code addition/deletion levels are relatively stable cycle to cycle.
> Our developer and Ironic operator commit contribution levels have
> increased in Train over Stein, while the overall pool of contributors
> has continued to decline cycle after cycle, although not dramatically.
> I think the takeaway from this is that as ironic has become more and
> more stable, and that the problems being solved in many cases are
> operator specific needs or wants, or bug fixes in cases that are only
> raised in particular environment configurations.
>
> The only real question that came out of the project update was, if my
> memory is correct, was “What does Metal^3 mean for Ironic”, and “Who
> is driving forward Metal^3?” The answers are fairly straight forward,
> more ironic users and more use cases from Metal^3 driving ironic to
> deploy machines. As for who is driving it forward, it is largely being
> driven forward by Red Hat along with interested communities and
> hardware vendors.
>
> Quick, Solid, and Automatic OpenStack Bare-Metal Orchestration
> ==================================================
>
> Rico Lin, the Heat PTL, proposed this talk promoting the possibility
> of using ironic naively to deploy bare metal nodes. Specifically where
> configuration pass-through can’t be made generic or somehow
> articulated through the compute API. Cases where they may be is where
> someone wishes to utilize something like our “ramdisk”
> deploy_interface which does not deploy an image to the actual physical
> disk. The only real question that I seem to remember coming up was the
> question why might someone want or need to do this, which again
> becomes more of a question of doing things that are not quite
> “compute” API-ish. The patches are available in gerrit[10].
>
> Operator Feedback Session
> =====================
>
> The operator feedback[0] session was not as well populated with maybe
> ~20-25 people present. Overall the feeling of the room was that
> “everything works”, however there is a need and desire for information
> and additional capabilities.
>
> * Detailed driver support matrix
> * Reduce the deployment times further
> * Disk key rotation is an ask from operators for drives that claim
> smart erase support but end up doing a drive wipe instead. In essence,
> to reduce the overall time spent cleaning
> * Software RAID is needed at deploy time.
> * IPA needs improved error handling. - This may be a case where
> something of the communication flow changes that had been previously
> discussed could help in that we could actively try and keep track of
> the agent a little more. Additional discussion will definitely be
> required.
> * There does still seem to be some interest in graphical console
> support. A contributor has been revising patches, but I think it would
> really help for a vendor to become involved here and support accessing
> their graphical interface through such a method.
> * Information and an information sharing location is needed. I’ve
> reached out to the Foundation staff regarding the Bare Metal Logo
> Program to see if we can find a common place that we can build/foster
> moving forward. In this topic, the one major pain point began being
> stressed, issues with the resource tracker at 3,500 bare metal nodes.
> Privately another operator reached out with the same issue in the
> scale of tens of thousands of bare metal nodes. As such, this became a
> topic during the PTG which gained further discussion. I’ll cover that
> later.
>
> Ironic – Snapshots?
> ===============
>
> As a result of some public discussion of adding snapshot capability, I
> proposed a forum session to discuss the topic[1] such that
> requirements can be identified and the discussion can continue over
> the next cycle.
> I didn't expect the number of attendees present to swell from the
> operator's feedback session. The discussion of requirements went back
> and forth to ultimately define "what is a snapshot" in this case, and
> "what should Ironic do?"
>
> There was quite a bit of interaction in this session and the consensus
> seemed to be the following:
> * Don’t make it reliant on nova, for standalone users may want/need to use it.
> * This could be a very powerful feature as an operator could ``adopt``
> a machine into ironic and then ``snapshot`` it to capture the disk
> contents.
> * Block level only and we can’t forget about capturing/storing content checksum
> * Capture the machine’s contents with the same expectation as we would
> have for a VM, and upload this to someplace.
>
> In order to make this happen in a fashion which will scale, the ironic
> team will likely need to leverage the application credentials.
>
> Ironically reeling in large bare metal deployment without PXE
> ==============================================
>
> This was a talk submitted by Ilya Etingof, who unfortunately was
> unable to make it to the summit. Special thanks goes to Both Ilya and
> Richard Pioso for working together to make this demonstration happen.
> The idea was to demonstrate where the ironic team sees the future of
> deployment of machines on the edge using virtual media and how vendors
> would likely interact with that in some cases as slightly different
> mechanics may be required even if the BMCs all speak Redfish, which is
> the case for a Dell iDRAC BMC.
>
> The idea[2] ultimately being is that the conductor would inject the
> configuration information into the virtual media ISO image that is
> attached via virtual media negating the need for DHCP. We have videos
> posted that allow those interested to see what this functionality
> looks like with neutron[3] and without neutron[4].
>
> While the large audience was impressed, it seemed to be a general
> surprise that Ironic had virtual media support in some of the drivers
> previously. This talk spurred quite a bit of conversation and hallway
> track style discussion after the presentation concluded which is
> always an excellent sign.
>
> Project Teams Gathering
> ===================
>
> The ironic community PTG attendance was nothing short of excellent.
> Thank you everyone who attended! At one point we had fifteen people
> and a chair had to be pulled up to our table for a 16th person to join
> us. At which point, we may have captured another table and created
> confusion.
>
> We did things a little differently this time around. Given some of the
> unknowns, we did not create a strict schedule around the topics. We
> simply went through and prioritized topics and tried to discuss them
> each as thoroughly as possible until we had reached the conclusion or
> a consensus on the topic.
>
> Topics and a few words on each topic we discussed in the notes section
> on the PTG etherpad[5].
>
> On-boarding
> -----------------
>
> We had three contributors that attended a fairly brief on-boarding
> overview of Ironic. Two of them were more developer focused where as
> the third was more of an operator focus looking to leverage ironic and
> see how they can contribute back to the community.
>
> BareMetal SIG - Next Steps
> -------------------------------------
>
> Arne Wiebalck and I both provided an update including current
> conversations where we saw the SIG, the Logo Program, the white paper,
> and what should the SIG do beyond the whitepaper.
>
> To start with the Logo program, it largely seems there that somewhere
> along the way a message or document got lost and that largely impacted
> the Logo Program -> SIG feedback mechanism. I’m working with the
> OpenStack Foundation to fix that and get communication going again.
> Largely what spurred that was that some vendors expressed interest in
> joining, and wanted additional information.
>
> As for the white paper, contributions are welcome and progress is
> being made again.
>
> From a next steps standpoint, the question was raised how do we build
> up an improved Operator point of contact. There was some consensus
> that we as a community should try to encourage at least one
> contributor to attend the operations mid-cycles. This allows for a
> somewhat shorter feedback look with a different audience.
>
> We also discussed knowledge sharing, or how to improve it. Included
> with this is how do we share best practices. I’ve got the question out
> to folks at the foundation if there is a better way as part of the
> Logo program, or if we should just use the Wiki. I think this will be
> an open discussion topic in the coming weeks.
>
> The final question that came up as part of the SIG is how to show
> activity. I reached out to Amy on the UC regarding this, and it seems
> the process is largely just reach out to the current leaders of the
> SIG, so it is critical that we keep that up to date moving forward.
>
> Sensor Data/Metrics
> ---------------------------
>
> The barrier between Tenant level information and Operator level
> information is difficult with this topic.
>
> The consensus among the group was that the capability to collect some
> level of OOB sensor data should be present on all drivers, but there
> is also a recognition that this comes at a cost and possible
> performance impact. Mainly this performance impact question was raised
> with Redfish because this data is scattered around the API where
> multiple API calls are required, and may even cause some interruption
> to actively inquire upon some data points.
>
> The middle ground in the discussion came to adding a capability of
> somehow saying “collect power status, temp every minute, fan speeds
> every five minutes, drive/cpu health data maybe every 30 minutes”. I
> would be remiss if I didn't note that there was joking about how this
> would in essence be re-implementation of Cron. What this would end up
> looking like, we don’t know, but it would provide operators the data
> resolution necessary for the failure risk/impact. The analogy used was
> that “If the temperature sensor has risen to an alarm level, either a
> AC failure or a thermal hot spot forming based upon load in the data
> center, checking the sensor too often is just not going to result in a
> human investigating that on the data center floor any faster.”
>
> Mainly I believe this discussion largely stresses that the information
> is for the operator of the bare metal and not to provide insight into
> a tenant monitoring system, that those activities should largely be
> done with-in the operating system.
>
> One question among the group was if anyone was using the metrics
> framework built into ironic already for metrics of ironic itself, to
> see if we can re-use it. Well, it uses a plugin interface! In any
> event, I've sent a post to the openstack-discuss mailing list seeking
> usage information.
>
>
> Node Retirement
> -----------------------
>
> This is a returning discussion from the last PTG, and in discussing
> the topic we figured out where the discussion became derailed at
> previously.  In essence, the desire was to mix this with the concept
> of being able to take a node “out of service”. Except, taking a node
> out of service is an immediate state related flag, where as retiring
> might be as soon as the current tenant vacates the machine… possibly
> in three to six months.
>
> In other words, one is “do something or nothing now”, and the other is
> “do something later when a particular state boundary is crossed”.
> Trying to make one solution for both, doesn’t exactly work.
>
> Unanimous consensus among those present was that in order to provide
> node retirement functionality, that the logic should be similar to
> maintenance/maintenance reason. A top level field in the node object
> that would allow API queries for nodes slated for retirement, which
> helps solve an operator workflow conundrum “How do I know what is
> slated for retirement but not yet vacated?”
>
> Going back to the “out of service” discussion, we reached consensus
> that this was in essence a “user declarable failed state”, and as such
> that it should be done only in the state machine as it is in the
> present, not a future action.  Should we implement out of service,
> we’ll need to check the nova.virt.ironic code and related virt code to
> properly handle nodes dropping from `ACTIVE` state, which could also
> be problematic and need to be API version guarded to prevent machines
> from accidentally entering `ERROR` state if they are not automatically
> recovered in nova.
>
> Multi-tenancy
> ------------------
>
> Lots of interest existed around making the API somewhat of a
> mutli-tenant aware interaction, and the exact interactions and uses
> involved there are not exactly clear. What IS clear is that providing
> functionality as such will allow operators to remove complication in
> their resource classes and tenant specific flavors which is presently
> being used to enable tenant specific hardware pools. The added benefit
> of providing some level for normally non-admin users to access the
> ironic API is that it would allow those tenants to have a clear
> understanding of their used resources and available resources by
> directly asking ironic, where as presently, they don’t have a good way
> to collect nor understand that short of asking the cloud operator when
> it comes to bare metal. Initial work has been posted for this to
> gerrit[6].
>
> In terms of how tenants resources would be shared, there was consensus
> that the community should stress that new special use tenants should
> be created for collaborative efforts.
>
> There was some discussion regarding explicitly dropping fields for
> non-privileged users that can see the nodes, such as driver_info and
> possibly even driver_internal_info. Definitely a topic that requires
> more discussion, but that would solve operator reporting and use
> headaches.
>
> Manual Cleaning Out-Of-Band
> ----------------------------------------
>
> The point was raised that we unconditionally start the agent ramdisk
> to perform manual cleaning. Except, we should support a method of out
> of band cleaning operators to only be executed so the bare metal node
> doesn’t need to be booted to a ramdisk.
>
> The consensus seemed to be that we should consider a decorator or
> existing decorator change that allows the conductor to hold off
> actually powering the node on for ramdisk boot unless or until a step
> is reached that is not purely out of band.
>
> In essence, fixing this allows a “fix_bmc” out of band clean step to
> be executed first without trying to modify BMC settings, which would
> presently fail.
>
> Scale issues
> -----------------
>
> A number of scaling issues between how nova and ironic interact,
> specifically with the resource tracker and how inventory is updated
> from ironic and loaded into nova. Largely this issue revolves around
> the concept in nova that each ``nova-compute`` is a hypervisor. And
> while one can run multiple ``nova-compute`` processes to serve as the
> connection to ironic, the underlying lock in Nova is at the level of
> the compute node, not the node level. This means as thousands of
> records are downloaded, synced, copied into the resource tracker, the
> compute process is essentially blocked from other actions while this
> serialized job runs.
>
> In a typical VM case, you may only have at most a couple hundred VMs
> on a hypervisor, where as with bare metal, we’re potentially servicing
> thousands of physical machines.
>
> It should be noted that there are several large scale operators that
> indicated during the PTG that this was their pain point. Some of the
> contributors from CERN sat down with us and the nova team to try and
> hammer out a solution to this issue. A summary of that cross project
> session can be found at line 212 in the PTG etherpad[0].
>
> But there is another pain point that contributes to this performance
> issue and that is the speed at which records are returned by our API.
> We’ve had some operators voice some frustration with this before, and
> we should at least be mindful of this and hopefully see if we can
> improve record retrieval performance. In addition to this, if we
> supported some form of bulk “GET” of nodes, it might be able to be
> leveraged as opposed to a get on each node one at a time which is
> presently what occurs in the nova-compute process.
>
> Boot Mode Config
> ------------------------
>
> Previously, when scheduling occurred with flavors and filters were
> appropriately set, if a machine was declared as supporting only one
> boot mode, requests would only ever land on that node. Now with
> Traits, this is a bit different and unfortunately optional without
> logic to really guard the setting application for an instance.
>
> So in this case, if filters are such that a request for a Legacy boot
> instance lands on a UEFI only machine, we’ll still try to deploy it.
> In reality, we really should try and fail fast.
>
> Ideally the solution here is we consult with the BMC through some sort
> of get_supported_boot_modes method, and if we determine a mismatch
> between what the settings are or what the requested instance is from
> the data we have, we fail the deploy.
>
> This ultimately may require work in the nova.virt.ironic driver code
> to identify the cause of the failure being an invalid configuration
> and reporting that back, however it may not be fatal on another
> machine.
>
> Security of /heartbeat and /lookup endpoints
> -----------------------------------------------------------
>
> We had a discussion of adding some additional layers of security
> mechanics around the /heartbeat and /lookup endpoints in ironic’s REST
> API. These limited endpoints are documented as being unauthenticated,
> so naturally some issues can arise from these and we want to minimize
> the vectors in which an attacker that has gained access to a
> cleaning/provisioning/rescue network could possibly impersonate a
> running ironic-python-agent. Conversely, the ironic-python-agent runs
> in a similar fashion, intended to run on secure trusted networks which
> is only accessible to the ironic-conductor. As such, we also want to
> add some validation to the API request is from the same Ironic
> deployment that IPA is heart-beating to.
>
> The solution to this introduce a limited lifetime token that is unique
> per node per deployment. It would be stored in RAM on the agent, and
> in the node.driver_internal_info so it is available to the conductor.
> It would be provided only once via out of band OR via the first
> “lookup” of a node, and then only become accessible again during known
> reboot steps.
>
> Conceptually the introduction of tokens was well supported in the
> discussions and there were zero objections to doing so. Some initial
> patches[7][8] are under development to move this forward.
>
> An additional item is to add IP address filtering capabilities to both
> endpoints such that we only process the heartbeat/lookup address if we
> know it came from the correct IP address. An operator has written this
> feature downstream and consensus was unanimous at the PTG that we
> should accept this feature upstream. We should expect a patch for this
> functionality to be posted soon.
>
> Persistent Agents
> ------------------------
>
> The use case behind persistent agents is “I want to kexec my way to
> the agent ramdisk, or the next operating system.” and “I want to have
> up to date inspection data.” We’ve already somewhat solved the latter,
> but the former is a harder problem requiring the previously mentioned
> endpoint security enhancements to be in-place first. There is some
> interest from CERN and some other large scale operators.
>
> In other words, we should expect more of this from an bare metal fleet
> operations point of view for some environments as we move forward.
>
> “Managing hardware the Ironic way”
> -------------------------------------------------
>
> The question that spurred this discussion was “How do I provide a way
> for my hardware manager to know what it might need to do by default.”
> Except, those defaults may differ between racks that serve different
> purposes. “Rack 1, node0” may need a port set to FiberChannel mode,
> where as “Rack2, node1” may require it to be Ethernet.
>
> This quickly also reaches the discussion of “What if I need different
> firmware versions by default?”
>
> This topic quickly evolved from there and the idea that surfaced was
> that we introduce a new field on the node object for the storage of
> such data. Something like ``node.default_config``, where it would be a
> dictionary sort of like what a user provides for cleaning steps or
> deploy steps, that provides argument values which is iterated through
> when in automated cleaning mode to allow operators to fill in
> configuration requirement gaps for hardware managers.
>
>  Interestingly enough, even today we just had someone ask a similar
> question in IRC.
>
> This should ultimately be usable to assert desired/default firmware
> from an administrative point of view. Adrianc (Mellanox) is going to
> reach out to bdobb (DMTF) regarding the redfish PLDM firmware update
> interface to see where this may go from here.
>
> Edge computing working group session
> ----------------------------------------------------
>
> The edge working group largely became a session to update everyone on
> where Ironic was going and where we see things going in terms of
> managing bare metal at the edge/far-edge. This included some in-depth
> questions about dhcp-less deployment and related mechanics as well as
> HTTPBoot’ing machines.
>
> Supporting HTTPBoot does definitely seem to be of interest to a number
> of people, although at least after sharing my context only five or six
> people in attendance really seemed interested in ironic prioritizing
> such functionality. The primary blocker, for those that are unaware,
> is pre-built UEFI images for us to do integration testing for IPv4
> HTTPBoot. Functionally ironic already supports IPv6 HTTPBoot via
> DHCPv6 as part of our IPv6 support with PXE/iPXE, however we also
> don’t have an integration test job for this code path for the same
> reason, pre-built UEFI firmware images lack the built-in support.
>
> More minor PTG topics
> -------------------------------
>
> * Smartnics - A desire to attach virtual ports in ironic baremetal
> nodes with smartnics was raised. Seems that we don’t need to try and
> create a port entry in ironic. It seems we only need to track/signal
> and remove the “vif” attachment” to the node in general as there is no
> physical mac required for that virtual port in ironic. The constraint
> that at least one MAC address would be required to identify the
> machine is understood. If anyone sees an issue with this, please raise
> this to adrianc.
> * Metal^3 - Within the group attending the PTG, there was not much
> interest in Metal^3 or using CRDs to manage bare metal resources with
> ironic hidden behind the CRD. One factor related to this is the desire
> to define more data to be passed through to ironic which is not
> presently supported in the CRD definition..
>
> Stable Backports with Ironic's release model
> ==================================
>
> I was pulled into a discussion with the TC and the Stable team
> regarding frustrations that have been expressed with-in the ironic
> team regarding stable back-porting of fixes, mainly drivers. There is
> consensus that it is okay for us as the ironic team to backport
> drivery things when needed to support vendors as long as they are not
> breaking or overall behavior contracts. This quickly leads us to
> needing to also modify constraints for drivery things as well.
> Constraints changes will continue to be evaluated on a case by case
> basis, but the general consensus is there is full support to "do the
> right thing" for ironic's users, vendors, and community. The key is
> making sure we are on the same page and agreeing to what that right
> thing is. This is where asynchronous communication can get us into
> trouble, and I would highly encourage trying to start higher bandwidth
> discussion when these cases arise in the future. The key takeaway that
> we should likely keep in mind is policy is there for good reasons, but
> policy is not and can not be a crutch to prevent the right thing from
> being done.
>
> Additional items worth noting - Q1 Gatherings
> ===================================
>
> There will be an operations mid-cycle at Bloomberg in London, January
> 7th-8th, 2020. It would be good if at least one ironic contributor
> could attend as the operators group tends to be closer to the physical
> baremetal, and it is a good chance to build mutual context between
> developers and operations people actually using our software.
>
> Additionally, we want to gauge the interest of having an ironic
> mid-cycle in central Europe in Q1 of 2020. We need to identify the
> number of contributors that would be interested in and able to attend
> since the next PTG will be in June. Please email me off-list if your
> interested in attending and I'll make a note of it as we're still
> having initial discussions.
>
>
> And now I've reached a buffer under-run on words. If there are any
> questions, just reply to the list.
>
> -Julia
>
> Links:
>
> [0]: https://etherpad.openstack.org/p/PVG-ironic-operator-feedback
> [1]: https://etherpad.openstack.org/p/PVG-ironic-snapshot-support
> [2]: https://review.opendev.org/#/c/672780/
[3] https://drive.google.com/file/d/1_PaPM5FvCyM6jkACADwQtDeoJkfuZcAs/view?usp=sharing
[4] https://drive.google.com/file/d/1YUFmwblLbJ9uJgW6Rkf6pkW8ouU-PYFK/view?usp=sharing
> [5]: https://etherpad.openstack.org/p/PVG-Ironic-Planning
> [6]: https://review.opendev.org/#/c/689551/
> [7]: https://review.opendev.org/692609
> [8]: https://review.opendev.org/692614
> [9]: https://etherpad.openstack.org/p/ops-meetup-1st-2020
> [10]: https://review.opendev.org/#/q/topic:story/2006403+(status:open+OR+status:merged)



More information about the openstack-discuss mailing list