Hi, On 1/30/19 4:26 PM, Lars Kellogg-Stedman wrote:
Howdy.
I'm working with a group of people who are interested in enabling some form of baremetal leasing/reservations using Ironic. There are three key features we're looking for that aren't (maybe?) available right now:
- multi-tenancy: in addition to the ironic administrator, we need to be able to define a node "owner" (someone who controls a specific node) and a node "consumer" (someone who has been granted temporary access to a specific node). An "owner" always has the ability to control node power or access the console, can mark a node as available or not, and can set lease policies (such as a maximum lease lifetime) for a node. A "consumer" is granted access to power control and console only when they hold an active lease, and otherwise has no control over the node.
FYI we have an "owner" field in Ironic that you can use, but Ironic itself does not restrict access based on it. Well, does not *yet*, we can probably talk about it ;)
- leasing: a mechanism for marking nodes as available, requesting nodes for a specific length of time, and returning those nodes to the available pool when a lease has expired.
We're getting allocation API, which makes a part of it much easier: http://specs.openstack.org/openstack/ironic-specs/specs/not-implemented/allo.... It does not have a notion of lease time though. I suspect it is better to leave it to the upper level. It also does not have advanced filters (RAM >= 16G, etc), you can pre-filter nodes instead.
- hardware only: we'd like the ability to leave os provisioning up to the "consumer". For example, after someone acquires a node via the leasing mechanism, they can use Foreman to provisioning an os onto the node.
Allocation API is independent of deployment process, so you can allocate a node and leave it as it is. This is, however, not compatible with Nova approach. Nova does reservation and deployment in a seemingly single step.
For example, a workflow might look something like this:
- The owner of a baremetal node makes the node part of a pool of available hardware. They set a maximum lease lifetime of 5 days.
- A consumer issues a lease request for "3 nodes with >= 48GB of memory and >= 1 GPU" and "1 node with >= 16GB of memory and >= 1TB of local disk", with a required lease time of 3 days.
- The leasing system finds available nodes matching the hardware requirements and with owner-set lease policies matching the lease lifetime requirements.
- The baremetal nodes are assigned to the consumer, who can then attach them to networks and make use of their own provisioning tools (which may be another Ironic instance?) to manage the hardware. The consumer is able to control power on these nodes and access the serial console.
- At the end of the lease, the nodes are wiped and returned to the pool of available hardware. The previous consumer no longer has any access to the nodes.
Our initial thought is to implement this as a service that sits in front of Ironic and provides the multi-tenancy and policy logic, while using Ironic to actually control the hardware.
++
Does this seem like a reasonable path forward? On paper there's a lot of overlap here between what we want and features provided by things like the Nova schedulers or the Placement api, but it's not clear we can leverage those at the baremetal layer.
Thanks for your thoughts,