An awesome email Chris, thanks! Various thoughts below. On Thu, Feb 7, 2019 at 2:40 AM Chris Dent <cdent+os@anticdent.org> wrote:
On Wed, 6 Feb 2019, Lars Kellogg-Stedman wrote:
I'm still not clear on whether there's any way to make this work with existing tools, or if it makes sense to figure out to make Nova do this or if we need something else sitting in front of Ironic.
The community is not going to disagree with supporting a different model for access. For some time we've had a consensus that there is a need, it is just getting there and understanding the full of extent of the needs that is the conundrum. Today, a user doesn't need nova to deploy a baremetal machine, they just need baremetal_admin access rights and to have chosen which machine they want. I kind of feel like if there are specific access patterns and usage rights, then it would be good to write those down because the ironic api has always been geared for admin usage or usage via nova. While not perfect, each API endpoint is ultimately represent a pool of hardware resources to be managed. Different patterns do have different needs, and some of that may be filtering the view of hardware from a user, or only showing a user what they have rights to access. For example, with some of the discussion, there would conceivably be a need to expose or point to bmc credentials for machines checked out. That seems like a huge conundrum and would require access rights and an entire workflow, that is outside of a fully trusted or single tenant admin trusted environment. Ultimately I think some of this is going to require discussion in a specification document to hammer out exactly what is needed from ironic.
If I recall the early conversations correctly, one of the thoughts/frustrations that brought placement into existence was the way in which there needed to be a pile of flavors, constantly managed to reflect the variety of resources in the "cloud"; wouldn't it be nice to simply reflect those resources, ask for the things you wanted, not need to translate that into a flavor, and not need to create a new flavor every time some new thing came along?
I feel like this is also why we started heading in the direction of traits and why we now have the capability to have traits described about a specific node. Granted, traits doesn't solve it all, and operators kind of agreed (In the Sydney Forum) that they couldn't really agree on common trait names for additional baremetal traits.
It wouldn't be super complicated for Ironic to interact directly with placement to report hardware inventory at regular intervals and to get a list of machines that meet the "at least X GB RAM and Y GB disk space" requirements when somebody wants to boot (or otherwise select, perhaps for later use) a machine, circumventing nova and concepts like flavors. As noted elsewhere in the thread you lose concepts of tenancy, affinity and other orchestration concepts that nova provides. But if those don't matter, or if the shape of those things doesn't fit, it might (might!) be a simple matter of programming... I seem to recall there have been several efforts in this direction over the years, but not any that take advantage of placement.
I know myself and others in the ironic community would be interested to see a proof of concept and to support this behavior. Admittedly I don't know enough about placement and I suspect the bulk of our primary contributors are in a similar boat as myself with multiple commitments that would really prevent spending time on an experiment such as this.
One thing to keep in mind is the reasons behind the creation of custom resource classes like CUSTOM_BAREMETAL_GOLD for reporting ironic inventory (instead of the actual available hardware): A job on baremetal consumes all of it. If Ironic is reporting granular inventory, when it claims a big machine if the initial request was for a smaller machine, the claim would either need to be for all the stuff (to not leave inventory something else might like to claim) or some other kind of inventory manipulation (such as adjusting reserved) might be required.
I think some of this logic and some of the conundrums we've hit with nova interaction in the past is also one of the items that might seem as too much to take on, then again I guess it should end up being kind of simpler... I think.
One option might be to have all inventoried machines to have classes of resource for hardware and then something like a PHYSICAL_MACHINE class with a value of 1. When a request is made (including the PHSYICAL_MACHINE=1), the returned resources are sorted by "best fit" and an allocation is made. PHYSICAL_MACHINE goes to 0, taking that resource provider out of service, but leaving the usage an accurate representation of reality.
I feel like this was kind of already the next discussion direction, but I suspect I'm going to need to see a data model to picture it in my head. :(
I think it might be worth exploring, and so it's clear I'm not talking from my armchair here, I've been doing some experiments/hacks with launching VMs with just placement, etcd and a bit of python that have proven quite elegant and may help to demonstrate how simple an initial POC that talked with ironic instead could be:
Awesome, I'll add it to my list of things to check out!
-- Chris Dent ٩◔̯◔۶ https://anticdent.org/ freenode: cdent tw: @anticdent