[ironic] Hardware leasing with Ironic
Howdy.
I'm working with a group of people who are interested in enabling some form of baremetal leasing/reservations using Ironic. There are three key features we're looking for that aren't (maybe?) available right now:
- multi-tenancy: in addition to the ironic administrator, we need to be able to define a node "owner" (someone who controls a specific node) and a node "consumer" (someone who has been granted temporary access to a specific node). An "owner" always has the ability to control node power or access the console, can mark a node as available or not, and can set lease policies (such as a maximum lease lifetime) for a node. A "consumer" is granted access to power control and console only when they hold an active lease, and otherwise has no control over the node.
- leasing: a mechanism for marking nodes as available, requesting nodes for a specific length of time, and returning those nodes to the available pool when a lease has expired.
- hardware only: we'd like the ability to leave os provisioning up to the "consumer". For example, after someone acquires a node via the leasing mechanism, they can use Foreman to provisioning an os onto the node.
For example, a workflow might look something like this:
- The owner of a baremetal node makes the node part of a pool of available hardware. They set a maximum lease lifetime of 5 days.
- A consumer issues a lease request for "3 nodes with >= 48GB of memory and >= 1 GPU" and "1 node with >= 16GB of memory and >= 1TB of local disk", with a required lease time of 3 days.
- The leasing system finds available nodes matching the hardware requirements and with owner-set lease policies matching the lease lifetime requirements.
- The baremetal nodes are assigned to the consumer, who can then attach them to networks and make use of their own provisioning tools (which may be another Ironic instance?) to manage the hardware. The consumer is able to control power on these nodes and access the serial console.
- At the end of the lease, the nodes are wiped and returned to the pool of available hardware. The previous consumer no longer has any access to the nodes.
Our initial thought is to implement this as a service that sits in front of Ironic and provides the multi-tenancy and policy logic, while using Ironic to actually control the hardware.
Does this seem like a reasonable path forward? On paper there's a lot of overlap here between what we want and features provided by things like the Nova schedulers or the Placement api, but it's not clear we can leverage those at the baremetal layer.
Thanks for your thoughts,
Would Blazar provide much of this functionality? I think it only talks Nova at the moment.
It doesn't quite cover the use case but one approach we have taken is to define resources which expire after a length of time. Details are in https://techblog.web.cern.ch/techblog/post/expiry-of-vms-in-cern-cloud/ and the Mistral workflows are at https://gitlab.cern.ch/cloud-infrastructure/mistral-workflows.
Tim
-----Original Message----- From: Lars Kellogg-Stedman lars@redhat.com Date: Wednesday, 30 January 2019 at 16:28 To: "openstack-discuss@lists.openstack.org" openstack-discuss@lists.openstack.org Cc: Tzu-Mainn Chen tzumainn@redhat.com, "Ansari, Mohhamad Naved" naved001@bu.edu, Kristi Nikolla knikolla@bu.edu, Julia Kreger jkreger@redhat.com, Ian Ballou iballou@redhat.com Subject: [ironic] Hardware leasing with Ironic
Howdy.
I'm working with a group of people who are interested in enabling some form of baremetal leasing/reservations using Ironic. There are three key features we're looking for that aren't (maybe?) available right now:
- multi-tenancy: in addition to the ironic administrator, we need to be able to define a node "owner" (someone who controls a specific node) and a node "consumer" (someone who has been granted temporary access to a specific node). An "owner" always has the ability to control node power or access the console, can mark a node as available or not, and can set lease policies (such as a maximum lease lifetime) for a node. A "consumer" is granted access to power control and console only when they hold an active lease, and otherwise has no control over the node.
- leasing: a mechanism for marking nodes as available, requesting nodes for a specific length of time, and returning those nodes to the available pool when a lease has expired.
- hardware only: we'd like the ability to leave os provisioning up to the "consumer". For example, after someone acquires a node via the leasing mechanism, they can use Foreman to provisioning an os onto the node.
For example, a workflow might look something like this:
- The owner of a baremetal node makes the node part of a pool of available hardware. They set a maximum lease lifetime of 5 days.
- A consumer issues a lease request for "3 nodes with >= 48GB of memory and >= 1 GPU" and "1 node with >= 16GB of memory and >= 1TB of local disk", with a required lease time of 3 days.
- The leasing system finds available nodes matching the hardware requirements and with owner-set lease policies matching the lease lifetime requirements.
- The baremetal nodes are assigned to the consumer, who can then attach them to networks and make use of their own provisioning tools (which may be another Ironic instance?) to manage the hardware. The consumer is able to control power on these nodes and access the serial console.
- At the end of the lease, the nodes are wiped and returned to the pool of available hardware. The previous consumer no longer has any access to the nodes.
Our initial thought is to implement this as a service that sits in front of Ironic and provides the multi-tenancy and policy logic, while using Ironic to actually control the hardware.
Does this seem like a reasonable path forward? On paper there's a lot of overlap here between what we want and features provided by things like the Nova schedulers or the Placement api, but it's not clear we can leverage those at the baremetal layer.
Thanks for your thoughts,
-- Lars Kellogg-Stedman lars@redhat.com | larsks @ {irc,twitter,github} http://blog.oddbit.com/ |
(sorry for the dupe, failed to reply all the first time around)
On Wed, Jan 30, 2019 at 11:15 AM Tim Bell Tim.Bell@cern.ch wrote:
Would Blazar provide much of this functionality? I think it only talks Nova at the moment.
Thanks for the pointer. I'll take a closer look at Blazar, because in my head it was restricted to Nova resource reservations, but perhaps it can extend beyond that. From another perspective, if we can convince Nova to hand out access to unprovisioned baremetal hosts, that might make this more of an option.
On Wed, 30 Jan 2019 at 16:28, Lars Kellogg-Stedman lars@redhat.com wrote:
(sorry for the dupe, failed to reply all the first time around)
On Wed, Jan 30, 2019 at 11:15 AM Tim Bell Tim.Bell@cern.ch wrote:
Would Blazar provide much of this functionality? I think it only talks Nova at the moment.
Thanks for the pointer. I'll take a closer look at Blazar, because in my head it was restricted to Nova resource reservations, but perhaps it can extend beyond that. From another perspective, if we can convince Nova to hand out access to unprovisioned baremetal hosts, that might make this more of an option.
Hi Lars,
Blazar currently only supports reservation of nodes via Nova. It isn't yet compatible with Ironic nodes managed by Nova, because of the lack of support for host aggregates for Ironic. We have a plan to fix this using placement aggregates instead.
However, Blazar is extendable, with a plugin architecture: a baremetal plugin could be developed that interacts directly with Ironic. This would allow leveraging the existing lease management code in Blazar. As an example, the Blazar project team has been busy this cycle implementing reservations of Neutron resources (floating IPs and network segments) [1].
Giving direct provisioning access to users means they will need BMC credentials and access to provisioning networks. If more isolation is required, you might want to take a look at HIL from the Mass Open Cloud [2]. I haven't used it but I have read one of their paper and it looks well-thought-out.
Pierre
[1] https://review.openstack.org/#/q/topic:bp/basic-network-plugin+(status:open+...) [2] https://massopen.cloud/blog/project-hil/
On Wed, Jan 30, 2019 at 04:47:09PM +0000, Pierre Riteau wrote:
However, Blazar is extendable, with a plugin architecture: a baremetal plugin could be developed that interacts directly with Ironic.
This would require Ironic to support multi-tenancy first, right?
Giving direct provisioning access to users means they will need BMC credentials and access to provisioning networks. If more isolation is required, you might want to take a look at HIL from the Mass Open Cloud [2]. I haven't used it but I have read one of their paper and it looks well-thought-out.
Ironically (hah!), the group I am working with *is* the Massachusetts Open Cloud, and we're looking to implement the ideas explored in HIL/BMI on top of OpenStack services.
On Wed, 30 Jan 2019 at 17:05, Lars Kellogg-Stedman lars@redhat.com wrote:
On Wed, Jan 30, 2019 at 04:47:09PM +0000, Pierre Riteau wrote:
However, Blazar is extendable, with a plugin architecture: a baremetal plugin could be developed that interacts directly with Ironic.
This would require Ironic to support multi-tenancy first, right?
Yes, assuming this would be available as per your initial message. Although technically you could use the Blazar API as a wrapper to provide the multi-tenancy, it would require duplicating a lot of the Ironic API into Blazar, so I wouldn't recommend this approach.
Giving direct provisioning access to users means they will need BMC credentials and access to provisioning networks. If more isolation is required, you might want to take a look at HIL from the Mass Open Cloud [2]. I haven't used it but I have read one of their paper and it looks well-thought-out.
Ironically (hah!), the group I am working with *is* the Massachusetts Open Cloud, and we're looking to implement the ideas explored in HIL/BMI on top of OpenStack services.
Heh, it's a small world :-) I would very happy to see these ideas implemented via OpenStack, it would surely help to get them more adopted.
On 1/31/19 11:58 AM, Pierre Riteau wrote:
On Wed, 30 Jan 2019 at 17:05, Lars Kellogg-Stedman lars@redhat.com wrote:
On Wed, Jan 30, 2019 at 04:47:09PM +0000, Pierre Riteau wrote:
However, Blazar is extendable, with a plugin architecture: a baremetal plugin could be developed that interacts directly with Ironic.
This would require Ironic to support multi-tenancy first, right?
Yes, assuming this would be available as per your initial message.
Some first steps have been done: http://specs.openstack.org/openstack/ironic-specs/specs/not-implemented/owne.... We need someone to drive the futher design and implementation though.
Although technically you could use the Blazar API as a wrapper to provide the multi-tenancy, it would require duplicating a lot of the Ironic API into Blazar, so I wouldn't recommend this approach.
Giving direct provisioning access to users means they will need BMC credentials and access to provisioning networks. If more isolation is required, you might want to take a look at HIL from the Mass Open Cloud [2]. I haven't used it but I have read one of their paper and it looks well-thought-out.
Ironically (hah!), the group I am working with *is* the Massachusetts Open Cloud, and we're looking to implement the ideas explored in HIL/BMI on top of OpenStack services.
Heh, it's a small world :-) I would very happy to see these ideas implemented via OpenStack, it would surely help to get them more adopted.
On Thu, Jan 31, 2019 at 12:09:07PM +0100, Dmitry Tantsur wrote:
Some first steps have been done: http://specs.openstack.org/openstack/ironic-specs/specs/not-implemented/owne.... We need someone to drive the futher design and implementation though.
That spec seems to be for a strictly informational field. Reading through it, I guess it's because doing something like this...
openstack baremetal node set --property owner=lars
...leads to sub-optimal performance when trying to filter a large number of hosts. I see that it's merged already, so I guess this is commenting-after-the-fact, but that seems like the wrong path to follow: I can see properties like "the contract id under which this system was purchased" being as or more important than "owner" from a large business perspective, so making it easier to filter by property on the server side would seem to be a better solution.
Or implement full multi-tenancy so that "owner" is more than simply informational, of course :).
On Fri, Feb 1, 2019 at 7:34 AM Lars Kellogg-Stedman lars@redhat.com wrote:
On Thu, Jan 31, 2019 at 12:09:07PM +0100, Dmitry Tantsur wrote:
Some first steps have been done: http://specs.openstack.org/openstack/ironic-specs/specs/not-implemented/owne.... We need someone to drive the futher design and implementation though.
That spec seems to be for a strictly informational field. Reading through it, I guess it's because doing something like this...
openstack baremetal node set --property owner=lars
...leads to sub-optimal performance when trying to filter a large number of hosts. I see that it's merged already, so I guess this is commenting-after-the-fact, but that seems like the wrong path to follow: I can see properties like "the contract id under which this system was purchased" being as or more important than "owner" from a large business perspective, so making it easier to filter by property on the server side would seem to be a better solution.
Or implement full multi-tenancy so that "owner" is more than simply informational, of course :).
My original thought was more enable multi-purpose usage and should we ever get to a point where we want to offer filtered views by saying a baremetal_user can only see machines whose owner is set by their tenant. Sub-optimal for sure, but in order not to break baremetal_admin level usage we have to have a compromise. The alternative that comes to mind is build a new permission matrix model that delineates the two, but at some point someone is still the "owner" and is responsible for the hardware. The details we kind of want to keep out of storage and consideration in ironic are the more CMDB-ish details that would things like contracts and acquisition dates.
The other things we should consider is "Give me a physical machine" versus "I have my machines, I need to use them" approaches and such a model. I suspect this is quickly becoming a Forum worthy session.
-- Lars Kellogg-Stedman lars@redhat.com | larsks @ {irc,twitter,github} http://blog.oddbit.com/ |
On Thu, Jan 31, 2019 at 10:58:58AM +0000, Pierre Riteau wrote:
This would require Ironic to support multi-tenancy first, right?
Yes, assuming this would be available as per your initial message. Although technically you could use the Blazar API as a wrapper to provide the multi-tenancy, it would require duplicating a lot of the Ironic API into Blazar, so I wouldn't recommend this approach.
I think that it would be best to implement the multi-tenenacy at a lower level than Blazar.
Our thought was to prototype this by putting multi-tenancy and the related access control logic into a proxy service that sits between Ironic and the end user, although that still suffers from the same problem of needing the shim service to be aware of the much of the ironic API.
Ultimately it would be great to see Ironic develop native support multi-tenant operation.
On Wed, 2019-01-30 at 10:26 -0500, Lars Kellogg-Stedman wrote:
Howdy.
I'm working with a group of people who are interested in enabling some form of baremetal leasing/reservations using Ironic. There are three key features we're looking for that aren't (maybe?) available right now:
multi-tenancy: in addition to the ironic administrator, we need to be able to define a node "owner" (someone who controls a specific node) and a node "consumer" (someone who has been granted temporary access to a specific node). An "owner" always has the ability to control node power or access the console, can mark a node as available or not, and can set lease policies (such as a maximum lease lifetime) for a node. A "consumer" is granted access to power control and console only when they hold an active lease, and otherwise has no control over the node.
leasing: a mechanism for marking nodes as available, requesting nodes for a specific length of time, and returning those nodes to the available pool when a lease has expired.
hardware only: we'd like the ability to leave os provisioning up to the "consumer". For example, after someone acquires a node via the leasing mechanism, they can use Foreman to provisioning an os onto the node.
For example, a workflow might look something like this:
The owner of a baremetal node makes the node part of a pool of available hardware. They set a maximum lease lifetime of 5 days.
A consumer issues a lease request for "3 nodes with >= 48GB of memory and >= 1 GPU" and "1 node with >= 16GB of memory and >= 1TB of local disk", with a required lease time of 3 days.
The leasing system finds available nodes matching the hardware requirements and with owner-set lease policies matching the lease lifetime requirements.
The baremetal nodes are assigned to the consumer, who can then attach them to networks and make use of their own provisioning tools (which may be another Ironic instance?) to manage the hardware. The consumer is able to control power on these nodes and access the serial console.
At the end of the lease, the nodes are wiped and returned to the pool of available hardware. The previous consumer no longer has any access to the nodes.
Our initial thought is to implement this as a service that sits in front of Ironic and provides the multi-tenancy and policy logic, while using Ironic to actually control the hardware.
have you looked at blazar https://docs.openstack.org/blazar/queens/index.html it is basically desigedn to do this.
Does this seem like a reasonable path forward? On paper there's a lot of overlap here between what we want and features provided by things like the Nova schedulers or the Placement api, but it's not clear we can leverage those at the baremetal layer.
Thanks for your thoughts,
On Wed, 2019-01-30 at 16:17 +0000, Sean Mooney wrote:
On Wed, 2019-01-30 at 10:26 -0500, Lars Kellogg-Stedman wrote:
Howdy.
I'm working with a group of people who are interested in enabling some form of baremetal leasing/reservations using Ironic. There are three key features we're looking for that aren't (maybe?) available right now:
multi-tenancy: in addition to the ironic administrator, we need to be able to define a node "owner" (someone who controls a specific node) and a node "consumer" (someone who has been granted temporary access to a specific node). An "owner" always has the ability to control node power or access the console, can mark a node as available or not, and can set lease policies (such as a maximum lease lifetime) for a node. A "consumer" is granted access to power control and console only when they hold an active lease, and otherwise has no control over the node.
leasing: a mechanism for marking nodes as available, requesting nodes for a specific length of time, and returning those nodes to the available pool when a lease has expired.
hardware only: we'd like the ability to leave os provisioning up to the "consumer". For example, after someone acquires a node via the leasing mechanism, they can use Foreman to provisioning an os onto the node.
For example, a workflow might look something like this:
The owner of a baremetal node makes the node part of a pool of available hardware. They set a maximum lease lifetime of 5 days.
A consumer issues a lease request for "3 nodes with >= 48GB of memory and >= 1 GPU" and "1 node with >= 16GB of memory and >= 1TB of local disk", with a required lease time of 3 days.
The leasing system finds available nodes matching the hardware requirements and with owner-set lease policies matching the lease lifetime requirements.
The baremetal nodes are assigned to the consumer, who can then attach them to networks and make use of their own provisioning tools (which may be another Ironic instance?) to manage the hardware. The consumer is able to control power on these nodes and access the serial console.
At the end of the lease, the nodes are wiped and returned to the pool of available hardware. The previous consumer no longer has any access to the nodes.
Our initial thought is to implement this as a service that sits in front of Ironic and provides the multi-tenancy and policy logic, while using Ironic to actually control the hardware.
have you looked at blazar https://docs.openstack.org/blazar/queens/index.html it is basically desigedn to do this.
Does this seem like a reasonable path forward? On paper there's a lot of overlap here between what we want and features provided by things like the Nova schedulers or the Placement api, but it's not clear we can leverage those at the baremetal layer.
Thanks for your thoughts,
Hi,
On 1/30/19 4:26 PM, Lars Kellogg-Stedman wrote:
Howdy.
I'm working with a group of people who are interested in enabling some form of baremetal leasing/reservations using Ironic. There are three key features we're looking for that aren't (maybe?) available right now:
- multi-tenancy: in addition to the ironic administrator, we need to be able to define a node "owner" (someone who controls a specific node) and a node "consumer" (someone who has been granted temporary access to a specific node). An "owner" always has the ability to control node power or access the console, can mark a node as available or not, and can set lease policies (such as a maximum lease lifetime) for a node. A "consumer" is granted access to power control and console only when they hold an active lease, and otherwise has no control over the node.
FYI we have an "owner" field in Ironic that you can use, but Ironic itself does not restrict access based on it. Well, does not *yet*, we can probably talk about it ;)
- leasing: a mechanism for marking nodes as available, requesting nodes for a specific length of time, and returning those nodes to the available pool when a lease has expired.
We're getting allocation API, which makes a part of it much easier: http://specs.openstack.org/openstack/ironic-specs/specs/not-implemented/allo....
It does not have a notion of lease time though. I suspect it is better to leave it to the upper level.
It also does not have advanced filters (RAM >= 16G, etc), you can pre-filter nodes instead.
- hardware only: we'd like the ability to leave os provisioning up to the "consumer". For example, after someone acquires a node via the leasing mechanism, they can use Foreman to provisioning an os onto the node.
Allocation API is independent of deployment process, so you can allocate a node and leave it as it is. This is, however, not compatible with Nova approach. Nova does reservation and deployment in a seemingly single step.
For example, a workflow might look something like this:
The owner of a baremetal node makes the node part of a pool of available hardware. They set a maximum lease lifetime of 5 days.
A consumer issues a lease request for "3 nodes with >= 48GB of memory and >= 1 GPU" and "1 node with >= 16GB of memory and >= 1TB of local disk", with a required lease time of 3 days.
The leasing system finds available nodes matching the hardware requirements and with owner-set lease policies matching the lease lifetime requirements.
The baremetal nodes are assigned to the consumer, who can then attach them to networks and make use of their own provisioning tools (which may be another Ironic instance?) to manage the hardware. The consumer is able to control power on these nodes and access the serial console.
At the end of the lease, the nodes are wiped and returned to the pool of available hardware. The previous consumer no longer has any access to the nodes.
Our initial thought is to implement this as a service that sits in front of Ironic and provides the multi-tenancy and policy logic, while using Ironic to actually control the hardware.
++
Does this seem like a reasonable path forward? On paper there's a lot of overlap here between what we want and features provided by things like the Nova schedulers or the Placement api, but it's not clear we can leverage those at the baremetal layer.
Thanks for your thoughts,
On Wed, Jan 30, 2019 at 10:26:04AM -0500, Lars Kellogg-Stedman wrote:
Howdy.
I'm working with a group of people who are interested in enabling some form of baremetal leasing/reservations using Ironic...
Hey everyone,
Thanks for the feedback! Based on the what I've heard so far, I'm beginning to think our best course of action is:
1. Implement multi-tenancy either (a) directly in Ironic or (b) in a shim service that sits between Ironic and the client.
2. Implement a Blazar plugin that is able to talk to whichever service in (1) is appropriate.
3. Work with Blazar developers to implement any lease logic that we think is necessary.
-- Lars Kellogg-Stedman lars@redhat.com | larsks @ {irc,twitter,github} http://blog.oddbit.com/ |
On Fri, 2019-02-01 at 12:09 -0500, Lars Kellogg-Stedman wrote:
On Wed, Jan 30, 2019 at 10:26:04AM -0500, Lars Kellogg-Stedman wrote:
Howdy.
I'm working with a group of people who are interested in enabling some form of baremetal leasing/reservations using Ironic...
Hey everyone,
Thanks for the feedback! Based on the what I've heard so far, I'm beginning to think our best course of action is:
- Implement multi-tenancy either (a) directly in Ironic or (b) in a
shim service that sits between Ironic and the client.
that shim service could be nova, which already has multi tenancy.
- Implement a Blazar plugin that is able to talk to whichever service
in (1) is appropriate.
and nova is supported by blazar
- Work with Blazar developers to implement any lease logic that we
think is necessary.
+1 by they im sure there is a reason why you dont want to have blazar drive nova and nova dirve ironic but it seam like all the fucntionality would already be there in that case.
-- Lars Kellogg-Stedman lars@redhat.com | larsks @ {irc,twitter,github} http://blog.oddbit.com/m/ |
On Fri, Feb 01, 2019 at 06:16:42PM +0000, Sean Mooney wrote:
- Implement multi-tenancy either (a) directly in Ironic or (b) in a
shim service that sits between Ironic and the client.
that shim service could be nova, which already has multi tenancy.
- Implement a Blazar plugin that is able to talk to whichever service
in (1) is appropriate.
and nova is supported by blazar
- Work with Blazar developers to implement any lease logic that we
think is necessary.
+1 by they im sure there is a reason why you dont want to have blazar drive nova and nova dirve ironic but it seam like all the fucntionality would already be there in that case.
Sean,
Being able to use Nova is a really attractive idea. I'm a little fuzzy on some of the details, though, starting with how to handle node discovery. A key goal is being able to parametrically request systems ("I want a system with a GPU and >= 40GB of memory"). With Nova, would this require effectively creating a flavor for every unique hardware configuration? Conceptually, I want "... create server --flavor any --filter 'has_gpu and member_mb>40000' ...", but it's not clear to me if that's something we could do now or if that would require changes to the way Nova handles baremetal scheduling.
Additionally, we also want the ability to acquire a node without provisioning it, so that a consumer can use their own provisioning tool. From Nova's perspective, I guess this would be like requesting a system without specifying an image. Is that possible right now?
I'm sure I'll have other questions, but these are the first few that crop up.
Thanks,
A few years ago, there was a discussion in one of the summit forums where users wanted to be able to come along to a generic OpenStack cloud and say "give me the flavor that has at least X GB RAM and Y GB disk space". At the time, the thoughts were that this could be done by doing a flavour list and then finding the smallest one which matched the requirements.
Would that be an option or would it require some more Nova internals?
For reserving, you could install the machine with a simple image and then let the user rebuild with their choice?
Not sure if these meet what you'd like but it may allow a proof-of-concept without needing too many code changes.
Tim
-----Original Message----- From: Lars Kellogg-Stedman lars@redhat.com Date: Wednesday, 6 February 2019 at 16:44 To: Sean Mooney smooney@redhat.com Cc: "Ansari, Mohhamad Naved" naved001@bu.edu, Julia Kreger jkreger@redhat.com, Ian Ballou iballou@redhat.com, Kristi Nikolla knikolla@bu.edu, "openstack-discuss@lists.openstack.org" openstack-discuss@lists.openstack.org, Tzu-Mainn Chen tzumainn@redhat.com Subject: Re: [ironic] Hardware leasing with Ironic
On Fri, Feb 01, 2019 at 06:16:42PM +0000, Sean Mooney wrote: > > 1. Implement multi-tenancy either (a) directly in Ironic or (b) in a > > shim service that sits between Ironic and the client. > that shim service could be nova, which already has multi tenancy. > > > > 2. Implement a Blazar plugin that is able to talk to whichever service > > in (1) is appropriate. > and nova is supported by blazar > > > > 3. Work with Blazar developers to implement any lease logic that we > > think is necessary. > +1 > by they im sure there is a reason why you dont want to have blazar drive > nova and nova dirve ironic but it seam like all the fucntionality would > already be there in that case.
Sean,
Being able to use Nova is a really attractive idea. I'm a little fuzzy on some of the details, though, starting with how to handle node discovery. A key goal is being able to parametrically request systems ("I want a system with a GPU and >= 40GB of memory"). With Nova, would this require effectively creating a flavor for every unique hardware configuration? Conceptually, I want "... create server --flavor any --filter 'has_gpu and member_mb>40000' ...", but it's not clear to me if that's something we could do now or if that would require changes to the way Nova handles baremetal scheduling.
Additionally, we also want the ability to acquire a node without provisioning it, so that a consumer can use their own provisioning tool. From Nova's perspective, I guess this would be like requesting a system without specifying an image. Is that possible right now?
I'm sure I'll have other questions, but these are the first few that crop up.
Thanks,
-- Lars Kellogg-Stedman lars@redhat.com | larsks @ {irc,twitter,github} http://blog.oddbit.com/ |
On Wed, Feb 06, 2019 at 04:00:40PM +0000, Tim Bell wrote:
A few years ago, there was a discussion in one of the summit forums where users wanted to be able to come along to a generic OpenStack cloud and say "give me the flavor that has at least X GB RAM and Y GB disk space". At the time, the thoughts were that this could be done by doing a flavour list and then finding the smallest one which matched the requirements.
The problem is that "flavor list" part: that implies that every time someone adds a new hardware configuration to the environment (maybe they add a new group of machines, or maybe they simply upgrade RAM/disk/etc in some existing nodes), they need to manually create corresponding flavors. That also implies that you could quickly end up with an egregious number of flavors to represent different types of available hardware. Really, what we want is the ability to select hardware based on Ironic introspection data, without any manual steps in between.
I'm still not clear on whether there's any way to make this work with existing tools, or if it makes sense to figure out to make Nova do this or if we need something else sitting in front of Ironic.
For reserving, you could install the machine with a simple image and then let the user rebuild with their choice?
That's probably a fine workaround for now.
On Wed, 6 Feb 2019, Lars Kellogg-Stedman wrote:
I'm still not clear on whether there's any way to make this work with existing tools, or if it makes sense to figure out to make Nova do this or if we need something else sitting in front of Ironic.
If I recall the early conversations correctly, one of the thoughts/frustrations that brought placement into existence was the way in which there needed to be a pile of flavors, constantly managed to reflect the variety of resources in the "cloud"; wouldn't it be nice to simply reflect those resources, ask for the things you wanted, not need to translate that into a flavor, and not need to create a new flavor every time some new thing came along?
It wouldn't be super complicated for Ironic to interact directly with placement to report hardware inventory at regular intervals and to get a list of machines that meet the "at least X GB RAM and Y GB disk space" requirements when somebody wants to boot (or otherwise select, perhaps for later use) a machine, circumventing nova and concepts like flavors. As noted elsewhere in the thread you lose concepts of tenancy, affinity and other orchestration concepts that nova provides. But if those don't matter, or if the shape of those things doesn't fit, it might (might!) be a simple matter of programming... I seem to recall there have been several efforts in this direction over the years, but not any that take advantage of placement.
One thing to keep in mind is the reasons behind the creation of custom resource classes like CUSTOM_BAREMETAL_GOLD for reporting ironic inventory (instead of the actual available hardware): A job on baremetal consumes all of it. If Ironic is reporting granular inventory, when it claims a big machine if the initial request was for a smaller machine, the claim would either need to be for all the stuff (to not leave inventory something else might like to claim) or some other kind of inventory manipulation (such as adjusting reserved) might be required.
One option might be to have all inventoried machines to have classes of resource for hardware and then something like a PHYSICAL_MACHINE class with a value of 1. When a request is made (including the PHSYICAL_MACHINE=1), the returned resources are sorted by "best fit" and an allocation is made. PHYSICAL_MACHINE goes to 0, taking that resource provider out of service, but leaving the usage an accurate representation of reality.
I think it might be worth exploring, and so it's clear I'm not talking from my armchair here, I've been doing some experiments/hacks with launching VMs with just placement, etcd and a bit of python that have proven quite elegant and may help to demonstrate how simple an initial POC that talked with ironic instead could be:
https://github.com/cdent/etcd-compute
An awesome email Chris, thanks!
Various thoughts below.
On Thu, Feb 7, 2019 at 2:40 AM Chris Dent cdent+os@anticdent.org wrote:
On Wed, 6 Feb 2019, Lars Kellogg-Stedman wrote:
I'm still not clear on whether there's any way to make this work with existing tools, or if it makes sense to figure out to make Nova do this or if we need something else sitting in front of Ironic.
The community is not going to disagree with supporting a different model for access. For some time we've had a consensus that there is a need, it is just getting there and understanding the full of extent of the needs that is the conundrum.
Today, a user doesn't need nova to deploy a baremetal machine, they just need baremetal_admin access rights and to have chosen which machine they want. I kind of feel like if there are specific access patterns and usage rights, then it would be good to write those down because the ironic api has always been geared for admin usage or usage via nova. While not perfect, each API endpoint is ultimately represent a pool of hardware resources to be managed. Different patterns do have different needs, and some of that may be filtering the view of hardware from a user, or only showing a user what they have rights to access. For example, with some of the discussion, there would conceivably be a need to expose or point to bmc credentials for machines checked out. That seems like a huge conundrum and would require access rights and an entire workflow, that is outside of a fully trusted or single tenant admin trusted environment.
Ultimately I think some of this is going to require discussion in a specification document to hammer out exactly what is needed from ironic.
If I recall the early conversations correctly, one of the thoughts/frustrations that brought placement into existence was the way in which there needed to be a pile of flavors, constantly managed to reflect the variety of resources in the "cloud"; wouldn't it be nice to simply reflect those resources, ask for the things you wanted, not need to translate that into a flavor, and not need to create a new flavor every time some new thing came along?
I feel like this is also why we started heading in the direction of traits and why we now have the capability to have traits described about a specific node. Granted, traits doesn't solve it all, and operators kind of agreed (In the Sydney Forum) that they couldn't really agree on common trait names for additional baremetal traits.
It wouldn't be super complicated for Ironic to interact directly with placement to report hardware inventory at regular intervals and to get a list of machines that meet the "at least X GB RAM and Y GB disk space" requirements when somebody wants to boot (or otherwise select, perhaps for later use) a machine, circumventing nova and concepts like flavors. As noted elsewhere in the thread you lose concepts of tenancy, affinity and other orchestration concepts that nova provides. But if those don't matter, or if the shape of those things doesn't fit, it might (might!) be a simple matter of programming... I seem to recall there have been several efforts in this direction over the years, but not any that take advantage of placement.
I know myself and others in the ironic community would be interested to see a proof of concept and to support this behavior. Admittedly I don't know enough about placement and I suspect the bulk of our primary contributors are in a similar boat as myself with multiple commitments that would really prevent spending time on an experiment such as this.
One thing to keep in mind is the reasons behind the creation of custom resource classes like CUSTOM_BAREMETAL_GOLD for reporting ironic inventory (instead of the actual available hardware): A job on baremetal consumes all of it. If Ironic is reporting granular inventory, when it claims a big machine if the initial request was for a smaller machine, the claim would either need to be for all the stuff (to not leave inventory something else might like to claim) or some other kind of inventory manipulation (such as adjusting reserved) might be required.
I think some of this logic and some of the conundrums we've hit with nova interaction in the past is also one of the items that might seem as too much to take on, then again I guess it should end up being kind of simpler... I think.
One option might be to have all inventoried machines to have classes of resource for hardware and then something like a PHYSICAL_MACHINE class with a value of 1. When a request is made (including the PHSYICAL_MACHINE=1), the returned resources are sorted by "best fit" and an allocation is made. PHYSICAL_MACHINE goes to 0, taking that resource provider out of service, but leaving the usage an accurate representation of reality.
I feel like this was kind of already the next discussion direction, but I suspect I'm going to need to see a data model to picture it in my head. :(
I think it might be worth exploring, and so it's clear I'm not talking from my armchair here, I've been doing some experiments/hacks with launching VMs with just placement, etcd and a bit of python that have proven quite elegant and may help to demonstrate how simple an initial POC that talked with ironic instead could be:
https://github.com/cdent/etcd-compute
Awesome, I'll add it to my list of things to check out!
-- Chris Dent ٩◔̯◔۶ https://anticdent.org/ freenode: cdent tw: @anticdent
On Wed, 6 Feb 2019 at 15:47, Lars Kellogg-Stedman lars@redhat.com wrote:
On Fri, Feb 01, 2019 at 06:16:42PM +0000, Sean Mooney wrote:
- Implement multi-tenancy either (a) directly in Ironic or (b) in a
shim service that sits between Ironic and the client.
that shim service could be nova, which already has multi tenancy.
- Implement a Blazar plugin that is able to talk to whichever service
in (1) is appropriate.
and nova is supported by blazar
- Work with Blazar developers to implement any lease logic that we
think is necessary.
+1 by they im sure there is a reason why you dont want to have blazar drive nova and nova dirve ironic but it seam like all the fucntionality would already be there in that case.
Sean,
Being able to use Nova is a really attractive idea. I'm a little fuzzy on some of the details, though, starting with how to handle node discovery. A key goal is being able to parametrically request systems ("I want a system with a GPU and >= 40GB of memory"). With Nova, would this require effectively creating a flavor for every unique hardware configuration? Conceptually, I want "... create server --flavor any --filter 'has_gpu and member_mb>40000' ...", but it's not clear to me if that's something we could do now or if that would require changes to the way Nova handles baremetal scheduling.
Such node selection is something you can already do with Blazar using the parameters "hypervisor_properties" (which are hypervisor details automatically imported from Nova) and "resource_properties" (extra key/value pairs that can be tagged on the resource, which could be has_gpu=true) when creating reservations: https://developer.openstack.org/api-ref/reservation/v1/index.html?expanded=c...
I believe you can also do such filtering with the ComputeCapabilitiesFilter directly with Nova. It was supposed to be deprecated (https://review.openstack.org/#/c/603102/) but it looks like it's staying around for now.
In either case, using Nova still requires a flavor to be selected, but you could have a single "baremetal" flavor associated with a single resource class for the whole baremetal cloud.
On Wed, 6 Feb 2019 at 23:17, Pierre Riteau pierre@stackhpc.com wrote:
I believe you can also do such filtering with the ComputeCapabilitiesFilter directly with Nova. It was supposed to be deprecated (https://review.openstack.org/#/c/603102/) but it looks like it's staying around for now.
Sorry, I was actually thinking about JsonFilter rather than ComputeCapabilitiesFilter. The former allows users to pass a query via scheduler hints, while the latter filters based on flavors.
participants (7)
-
Chris Dent
-
Dmitry Tantsur
-
Julia Kreger
-
Lars Kellogg-Stedman
-
Pierre Riteau
-
Sean Mooney
-
Tim Bell