[ironic][ops] Taking ironic nodes out of production

older
[goal][python3] Train unit tests...

Arne Wiebalck

20 May 2019 20 May '19

4:33 p.m.

Dear all, One of the discussions at the PTG in Denver raised the need for a mechanism to take ironic nodes out of production (a task for which the currently available 'maintenance' flag does not seem appropriate [1]). The use case there is an unhealthy physical node in state 'active', i.e. associated with an instance. The request is then to enable an admin to mark such a node as 'faulty' or 'in quarantine' with the aim of not returning the node to the pool of available nodes once the hosted instance is deleted. A very similar use case which came up independently is node retirement: it should be possible to mark nodes ('active' or not) as being 'up for retirement' to prepare the eventual removal from ironic. As in the example above, ('active') nodes marked this way should not become eligible for instance scheduling again, but automatic cleaning, for instance, should still be possible. In an effort to cover these use cases by a more general "quarantine/retirement" feature: - are there additional use cases which could profit from such a "take a node out of service" mechanism? - would these use cases put additional constraints on how the feature should look like (e.g.: "should not prevent cleaning") - are there other characteristics such a feature should have (e.g.: "finding these nodes should be supported by the cli") Let me know if you have any thoughts on this. Cheers, Arne [1] https://etherpad.openstack.org/p/DEN-train-ironic-ptg, l. 360

Show replies by date

Bogdan Dobrelya

21 May 21 May

8:13 a.m.

[CC'ed edge-computing@lists.openstack.org] On 20.05.2019 18:33, Arne Wiebalck wrote:

...

Dear all,

One of the discussions at the PTG in Denver raised the need for a mechanism to take ironic nodes out of production (a task for which the currently available 'maintenance' flag does not seem appropriate [1]).

The use case there is an unhealthy physical node in state 'active', i.e. associated with an instance. The request is then to enable an admin to mark such a node as 'faulty' or 'in quarantine' with the aim of not returning the node to the pool of available nodes once the hosted instance is deleted.

A very similar use case which came up independently is node retirement: it should be possible to mark nodes ('active' or not) as being 'up for retirement' to prepare the eventual removal from ironic. As in the example above, ('active') nodes marked this way should not become eligible for instance scheduling again, but automatic cleaning, for instance, should still be possible.

In an effort to cover these use cases by a more general "quarantine/retirement" feature:

- are there additional use cases which could profit from such a "take a node out of service" mechanism?

There are security related examples described in the Edge Security Challenges whitepaper [0] drafted by k8s IoT SIG [1], like in the chapter 2 Trusting hardware, whereby "GPS coordinate changes can be used to force a shutdown of an edge node". So a node may be taken out of service as an indicator of a particular condition of edge hardware. [0] https://docs.google.com/document/d/1iSIk8ERcheehk0aRG92dfOvW5NjkdedN8F7mSUTr... [1] https://github.com/kubernetes/community/tree/master/wg-iot-edge

...

- would these use cases put additional constraints on how the feature should look like (e.g.: "should not prevent cleaning")

- are there other characteristics such a feature should have (e.g.: "finding these nodes should be supported by the cli")

Let me know if you have any thoughts on this.

Cheers, Arne

[1] https://etherpad.openstack.org/p/DEN-train-ironic-ptg, l. 360

-- Best regards, Bogdan Dobrelya, Irc #bogdando

Christopher Price

8:26 a.m.

New subject: [Edge-computing] [ironic][ops] Taking ironic nodes out of production

I would add that something as simple as an operator policy could/should be able to remove hardware from an operational domain. It does not specifically need to be a fault or retirement, it may be as simple as repurposing to a different operational domain. From an OpenStack perspective this should not require any special handling from "retirement", it's just to know that there may be time constraints implied in a policy change that could potentially be ignored in a "retirement scenario". Further, at least in my imagination, one might be reallocating hardware from one Ironic domain to another which may have implications on how we best bring a new node online. (or not, I'm no expert) </ end dubious thought stream> / Chris On 2019-05-21, 09:16, "Bogdan Dobrelya" <bdobreli@redhat.com> wrote: [CC'ed edge-computing@lists.openstack.org] On 20.05.2019 18:33, Arne Wiebalck wrote: > Dear all, > > One of the discussions at the PTG in Denver raised the need for > a mechanism to take ironic nodes out of production (a task for > which the currently available 'maintenance' flag does not seem > appropriate [1]). > > The use case there is an unhealthy physical node in state 'active', > i.e. associated with an instance. The request is then to enable an > admin to mark such a node as 'faulty' or 'in quarantine' with the > aim of not returning the node to the pool of available nodes once > the hosted instance is deleted. > > A very similar use case which came up independently is node > retirement: it should be possible to mark nodes ('active' or not) > as being 'up for retirement' to prepare the eventual removal from > ironic. As in the example above, ('active') nodes marked this way > should not become eligible for instance scheduling again, but > automatic cleaning, for instance, should still be possible. > > In an effort to cover these use cases by a more general > "quarantine/retirement" feature: > > - are there additional use cases which could profit from such a > "take a node out of service" mechanism? There are security related examples described in the Edge Security Challenges whitepaper [0] drafted by k8s IoT SIG [1], like in the chapter 2 Trusting hardware, whereby "GPS coordinate changes can be used to force a shutdown of an edge node". So a node may be taken out of service as an indicator of a particular condition of edge hardware. [0] https://docs.google.com/document/d/1iSIk8ERcheehk0aRG92dfOvW5NjkdedN8F7mSUTr... [1] https://github.com/kubernetes/community/tree/master/wg-iot-edge > > - would these use cases put additional constraints on how the > feature should look like (e.g.: "should not prevent cleaning") > > - are there other characteristics such a feature should have > (e.g.: "finding these nodes should be supported by the cli") > > Let me know if you have any thoughts on this. > > Cheers, > Arne > > > [1] https://etherpad.openstack.org/p/DEN-train-ironic-ptg, l. 360 > -- Best regards, Bogdan Dobrelya, Irc #bogdando _______________________________________________ Edge-computing mailing list Edge-computing@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/edge-computing

Arkady.Kanevsky＠dell.com

12:55 p.m.

New subject: [Edge-computing] [ironic][ops] Taking ironic nodes out of production

Let's dig deeper into requirements. I see three distinct use cases: 1. put node into maintenance mode. Say to upgrade FW/BIOS or any other life-cycle event. It stays in ironic cluster but it is no longer in use by the rest of openstack, like Nova. 2. Put node into "fail" state. That is remove from usage, remove from Ironic cluster. What cleanup, operator would like/can do is subject to failure. Depending on the node type it may need to be "replaced". 3. Put node into "available" to other usage. What cleanup operator wants to do will need to be defined. This is very similar step as used for Baremetal as a Service as node is reassigned back into available pool. Depending on the next usage of a node it may stay in the Ironic cluster or may be removed from it. Once removed it can be "retired" or used for any other purpose. Thanks, Arkady -----Original Message----- From: Christopher Price <christopher.price@est.tech> Sent: Tuesday, May 21, 2019 3:26 AM To: Bogdan Dobrelya; openstack-discuss@lists.openstack.org; edge-computing@lists.openstack.org Subject: Re: [Edge-computing] [ironic][ops] Taking ironic nodes out of production [EXTERNAL EMAIL] I would add that something as simple as an operator policy could/should be able to remove hardware from an operational domain. It does not specifically need to be a fault or retirement, it may be as simple as repurposing to a different operational domain. From an OpenStack perspective this should not require any special handling from "retirement", it's just to know that there may be time constraints implied in a policy change that could potentially be ignored in a "retirement scenario". Further, at least in my imagination, one might be reallocating hardware from one Ironic domain to another which may have implications on how we best bring a new node online. (or not, I'm no expert) </ end dubious thought stream> / Chris On 2019-05-21, 09:16, "Bogdan Dobrelya" <bdobreli@redhat.com> wrote: [CC'ed edge-computing@lists.openstack.org] On 20.05.2019 18:33, Arne Wiebalck wrote: > Dear all, > > One of the discussions at the PTG in Denver raised the need for > a mechanism to take ironic nodes out of production (a task for > which the currently available 'maintenance' flag does not seem > appropriate [1]). > > The use case there is an unhealthy physical node in state 'active', > i.e. associated with an instance. The request is then to enable an > admin to mark such a node as 'faulty' or 'in quarantine' with the > aim of not returning the node to the pool of available nodes once > the hosted instance is deleted. > > A very similar use case which came up independently is node > retirement: it should be possible to mark nodes ('active' or not) > as being 'up for retirement' to prepare the eventual removal from > ironic. As in the example above, ('active') nodes marked this way > should not become eligible for instance scheduling again, but > automatic cleaning, for instance, should still be possible. > > In an effort to cover these use cases by a more general > "quarantine/retirement" feature: > > - are there additional use cases which could profit from such a > "take a node out of service" mechanism? There are security related examples described in the Edge Security Challenges whitepaper [0] drafted by k8s IoT SIG [1], like in the chapter 2 Trusting hardware, whereby "GPS coordinate changes can be used to force a shutdown of an edge node". So a node may be taken out of service as an indicator of a particular condition of edge hardware. [0] https://docs.google.com/document/d/1iSIk8ERcheehk0aRG92dfOvW5NjkdedN8F7mSUTr... [1] https://github.com/kubernetes/community/tree/master/wg-iot-edge > > - would these use cases put additional constraints on how the > feature should look like (e.g.: "should not prevent cleaning") > > - are there other characteristics such a feature should have > (e.g.: "finding these nodes should be supported by the cli") > > Let me know if you have any thoughts on this. > > Cheers, > Arne > > > [1] https://etherpad.openstack.org/p/DEN-train-ironic-ptg, l. 360 > -- Best regards, Bogdan Dobrelya, Irc #bogdando _______________________________________________ Edge-computing mailing list Edge-computing@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/edge-computing _______________________________________________ Edge-computing mailing list Edge-computing@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/edge-computing

Julia Kreger

5:33 p.m.

New subject: [Edge-computing] [ironic][ops] Taking ironic nodes out of production

On Tue, May 21, 2019 at 5:55 AM <Arkady.Kanevsky@dell.com> wrote:

...

Let's dig deeper into requirements. I see three distinct use cases: 1. put node into maintenance mode. Say to upgrade FW/BIOS or any other life-cycle event. It stays in ironic cluster but it is no longer in use by the rest of openstack, like Nova. 2. Put node into "fail" state. That is remove from usage, remove from Ironic cluster. What cleanup, operator would like/can do is subject to failure. Depending on the node type it may need to be "replaced".

Or troubleshooted by a human, and could be returned to a non-failure state. I think largely the only way we as developers could support that is allow for hook scripts to be called upon entering/exiting such a state. That being said, At least from what Beth was saying at the PTG, this seems to be one of the most important states.

...

3. Put node into "available" to other usage. What cleanup operator wants to do will need to be defined. This is very similar step as used for Baremetal as a Service as node is reassigned back into available pool. Depending on the next usage of a node it may stay in the Ironic cluster or may be removed from it. Once removed it can be "retired" or used for any other purpose.

...

Thanks, Arkady

-----Original Message----- From: Christopher Price <christopher.price@est.tech> Sent: Tuesday, May 21, 2019 3:26 AM To: Bogdan Dobrelya; openstack-discuss@lists.openstack.org; edge-computing@lists.openstack.org Subject: Re: [Edge-computing] [ironic][ops] Taking ironic nodes out of production

[EXTERNAL EMAIL]

I would add that something as simple as an operator policy could/should be able to remove hardware from an operational domain. It does not specifically need to be a fault or retirement, it may be as simple as repurposing to a different operational domain. From an OpenStack perspective this should not require any special handling from "retirement", it's just to know that there may be time constraints implied in a policy change that could potentially be ignored in a "retirement scenario".

Further, at least in my imagination, one might be reallocating hardware from one Ironic domain to another which may have implications on how we best bring a new node online. (or not, I'm no expert) </ end dubious thought stream>

/ Chris

On 2019-05-21, 09:16, "Bogdan Dobrelya" <bdobreli@redhat.com> wrote:

[CC'ed edge-computing@lists.openstack.org]

On 20.05.2019 18:33, Arne Wiebalck wrote: > Dear all, > > One of the discussions at the PTG in Denver raised the need for > a mechanism to take ironic nodes out of production (a task for > which the currently available 'maintenance' flag does not seem > appropriate [1]). > > The use case there is an unhealthy physical node in state 'active', > i.e. associated with an instance. The request is then to enable an > admin to mark such a node as 'faulty' or 'in quarantine' with the > aim of not returning the node to the pool of available nodes once > the hosted instance is deleted. > > A very similar use case which came up independently is node > retirement: it should be possible to mark nodes ('active' or not) > as being 'up for retirement' to prepare the eventual removal from > ironic. As in the example above, ('active') nodes marked this way > should not become eligible for instance scheduling again, but > automatic cleaning, for instance, should still be possible. > > In an effort to cover these use cases by a more general > "quarantine/retirement" feature: > > - are there additional use cases which could profit from such a > "take a node out of service" mechanism?

There are security related examples described in the Edge Security Challenges whitepaper [0] drafted by k8s IoT SIG [1], like in the chapter 2 Trusting hardware, whereby "GPS coordinate changes can be used to force a shutdown of an edge node". So a node may be taken out of service as an indicator of a particular condition of edge hardware.

[0] https://docs.google.com/document/d/1iSIk8ERcheehk0aRG92dfOvW5NjkdedN8F7mSUTr... [1] https://github.com/kubernetes/community/tree/master/wg-iot-edge

> > - would these use cases put additional constraints on how the > feature should look like (e.g.: "should not prevent cleaning") > > - are there other characteristics such a feature should have > (e.g.: "finding these nodes should be supported by the cli") > > Let me know if you have any thoughts on this. > > Cheers, > Arne > > > [1] https://etherpad.openstack.org/p/DEN-train-ironic-ptg, l. 360 >

-- Best regards, Bogdan Dobrelya, Irc #bogdando

_______________________________________________ Edge-computing mailing list Edge-computing@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/edge-computing

_______________________________________________ Edge-computing mailing list Edge-computing@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/edge-computing

Arkady.Kanevsky＠dell.com

7 p.m.

New subject: [Edge-computing] [ironic][ops] Taking ironic nodes out of production

Inline response -----Original Message----- From: Julia Kreger <juliaashleykreger@gmail.com> Sent: Tuesday, May 21, 2019 12:33 PM To: Kanevsky, Arkady Cc: Christopher Price; Bogdan Dobrelya; openstack-discuss; edge-computing@lists.openstack.org Subject: Re: [Edge-computing] [ironic][ops] Taking ironic nodes out of production [EXTERNAL EMAIL] On Tue, May 21, 2019 at 5:55 AM <Arkady.Kanevsky@dell.com> wrote:

...

Let's dig deeper into requirements. I see three distinct use cases: 1. put node into maintenance mode. Say to upgrade FW/BIOS or any other life-cycle event. It stays in ironic cluster but it is no longer in use by the rest of openstack, like Nova. 2. Put node into "fail" state. That is remove from usage, remove from Ironic cluster. What cleanup, operator would like/can do is subject to failure. Depending on the node type it may need to be "replaced".

...

3. Put node into "available" to other usage. What cleanup operator wants to do will need to be defined. This is very similar step as used for Baremetal as a Service as node is reassigned back into available pool. Depending on the next usage of a node it may stay in the Ironic cluster or may be removed from it. Once removed it can be "retired" or used for any other purpose.

Do you mean "unprovision" a node and move it through cleaning? I'm not sure I understand what your trying to get across. There is a case where a node would have been moved to a "failed" state, and could be "unprovisioned". If we reach the point where we are able to unprovision, it seems like we might be able to re-deploy, so maybe the option is to automatically move to state which is kind of like bucket for broken nodes? AK: Before node is removed from Ironic some level of cleanup is expected. Especially if node is to be reused as Chris stated. I assume that that cleanup will be done by Ironic. What you do with the node after it is outside of Ironic is out of scope.

...

Thanks, Arkady

-----Original Message----- From: Christopher Price <christopher.price@est.tech> Sent: Tuesday, May 21, 2019 3:26 AM To: Bogdan Dobrelya; openstack-discuss@lists.openstack.org; edge-computing@lists.openstack.org Subject: Re: [Edge-computing] [ironic][ops] Taking ironic nodes out of production

[EXTERNAL EMAIL]

I would add that something as simple as an operator policy could/should be able to remove hardware from an operational domain. It does not specifically need to be a fault or retirement, it may be as simple as repurposing to a different operational domain. From an OpenStack perspective this should not require any special handling from "retirement", it's just to know that there may be time constraints implied in a policy change that could potentially be ignored in a "retirement scenario".

Further, at least in my imagination, one might be reallocating hardware from one Ironic domain to another which may have implications on how we best bring a new node online. (or not, I'm no expert) </ end dubious thought stream>

/ Chris

On 2019-05-21, 09:16, "Bogdan Dobrelya" <bdobreli@redhat.com> wrote:

[CC'ed edge-computing@lists.openstack.org]

On 20.05.2019 18:33, Arne Wiebalck wrote: > Dear all, > > One of the discussions at the PTG in Denver raised the need for > a mechanism to take ironic nodes out of production (a task for > which the currently available 'maintenance' flag does not seem > appropriate [1]). > > The use case there is an unhealthy physical node in state 'active', > i.e. associated with an instance. The request is then to enable an > admin to mark such a node as 'faulty' or 'in quarantine' with the > aim of not returning the node to the pool of available nodes once > the hosted instance is deleted. > > A very similar use case which came up independently is node > retirement: it should be possible to mark nodes ('active' or not) > as being 'up for retirement' to prepare the eventual removal from > ironic. As in the example above, ('active') nodes marked this way > should not become eligible for instance scheduling again, but > automatic cleaning, for instance, should still be possible. > > In an effort to cover these use cases by a more general > "quarantine/retirement" feature: > > - are there additional use cases which could profit from such a > "take a node out of service" mechanism?

There are security related examples described in the Edge Security Challenges whitepaper [0] drafted by k8s IoT SIG [1], like in the chapter 2 Trusting hardware, whereby "GPS coordinate changes can be used to force a shutdown of an edge node". So a node may be taken out of service as an indicator of a particular condition of edge hardware.

[0] https://docs.google.com/document/d/1iSIk8ERcheehk0aRG92dfOvW5NjkdedN8F7mSUTr... [1] https://github.com/kubernetes/community/tree/master/wg-iot-edge

> > - would these use cases put additional constraints on how the > feature should look like (e.g.: "should not prevent cleaning") > > - are there other characteristics such a feature should have > (e.g.: "finding these nodes should be supported by the cli") > > Let me know if you have any thoughts on this. > > Cheers, > Arne > > > [1] https://etherpad.openstack.org/p/DEN-train-ironic-ptg, l. 360 >

-- Best regards, Bogdan Dobrelya, Irc #bogdando

_______________________________________________ Edge-computing mailing list Edge-computing@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/edge-computing

_______________________________________________ Edge-computing mailing list Edge-computing@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/edge-computing

Arkady.Kanevsky＠dell.com

9 Aug 9 Aug

7:45 p.m.

New subject: [Edge-computing] [ironic][ops] Taking ironic nodes out of production

Julia, For #3 what I was trying to cover the case when Ironic is used to manage servers for multiple different platform clusters. Like 2 different OpenStack cluster that share single Ironic. Ore One OpenStack and one Kubernetes cluster with shared Ironic between them. This use case support take a node from one platform cluster, clean it up, and allocate to another platform cluster. Thanks, Arkady -----Original Message----- From: Julia Kreger <juliaashleykreger@gmail.com> Sent: Tuesday, May 21, 2019 12:33 PM To: Kanevsky, Arkady Cc: Christopher Price; Bogdan Dobrelya; openstack-discuss; edge-computing@lists.openstack.org Subject: Re: [Edge-computing] [ironic][ops] Taking ironic nodes out of production [EXTERNAL EMAIL] On Tue, May 21, 2019 at 5:55 AM <Arkady.Kanevsky@dell.com> wrote:

...

Let's dig deeper into requirements. I see three distinct use cases: 1. put node into maintenance mode. Say to upgrade FW/BIOS or any other life-cycle event. It stays in ironic cluster but it is no longer in use by the rest of openstack, like Nova. 2. Put node into "fail" state. That is remove from usage, remove from Ironic cluster. What cleanup, operator would like/can do is subject to failure. Depending on the node type it may need to be "replaced".

...

3. Put node into "available" to other usage. What cleanup operator wants to do will need to be defined. This is very similar step as used for Baremetal as a Service as node is reassigned back into available pool. Depending on the next usage of a node it may stay in the Ironic cluster or may be removed from it. Once removed it can be "retired" or used for any other purpose.

...

Thanks, Arkady

-----Original Message----- From: Christopher Price <christopher.price@est.tech> Sent: Tuesday, May 21, 2019 3:26 AM To: Bogdan Dobrelya; openstack-discuss@lists.openstack.org; edge-computing@lists.openstack.org Subject: Re: [Edge-computing] [ironic][ops] Taking ironic nodes out of production

[EXTERNAL EMAIL]

I would add that something as simple as an operator policy could/should be able to remove hardware from an operational domain. It does not specifically need to be a fault or retirement, it may be as simple as repurposing to a different operational domain. From an OpenStack perspective this should not require any special handling from "retirement", it's just to know that there may be time constraints implied in a policy change that could potentially be ignored in a "retirement scenario".

Further, at least in my imagination, one might be reallocating hardware from one Ironic domain to another which may have implications on how we best bring a new node online. (or not, I'm no expert) </ end dubious thought stream>

/ Chris

On 2019-05-21, 09:16, "Bogdan Dobrelya" <bdobreli@redhat.com> wrote:

[CC'ed edge-computing@lists.openstack.org]

On 20.05.2019 18:33, Arne Wiebalck wrote: > Dear all, > > One of the discussions at the PTG in Denver raised the need for > a mechanism to take ironic nodes out of production (a task for > which the currently available 'maintenance' flag does not seem > appropriate [1]). > > The use case there is an unhealthy physical node in state 'active', > i.e. associated with an instance. The request is then to enable an > admin to mark such a node as 'faulty' or 'in quarantine' with the > aim of not returning the node to the pool of available nodes once > the hosted instance is deleted. > > A very similar use case which came up independently is node > retirement: it should be possible to mark nodes ('active' or not) > as being 'up for retirement' to prepare the eventual removal from > ironic. As in the example above, ('active') nodes marked this way > should not become eligible for instance scheduling again, but > automatic cleaning, for instance, should still be possible. > > In an effort to cover these use cases by a more general > "quarantine/retirement" feature: > > - are there additional use cases which could profit from such a > "take a node out of service" mechanism?

There are security related examples described in the Edge Security Challenges whitepaper [0] drafted by k8s IoT SIG [1], like in the chapter 2 Trusting hardware, whereby "GPS coordinate changes can be used to force a shutdown of an edge node". So a node may be taken out of service as an indicator of a particular condition of edge hardware.

[0] https://docs.google.com/document/d/1iSIk8ERcheehk0aRG92dfOvW5NjkdedN8F7mSUTr... [1] https://github.com/kubernetes/community/tree/master/wg-iot-edge

> > - would these use cases put additional constraints on how the > feature should look like (e.g.: "should not prevent cleaning") > > - are there other characteristics such a feature should have > (e.g.: "finding these nodes should be supported by the cli") > > Let me know if you have any thoughts on this. > > Cheers, > Arne > > > [1] https://etherpad.openstack.org/p/DEN-train-ironic-ptg, l. 360 >

-- Best regards, Bogdan Dobrelya, Irc #bogdando

_______________________________________________ Edge-computing mailing list Edge-computing@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/edge-computing

_______________________________________________ Edge-computing mailing list Edge-computing@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/edge-computing

_______________________________________________ Edge-computing mailing list Edge-computing@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/edge-computing

Julia Kreger

21 May 21 May

5:28 p.m.

New subject: [Edge-computing] [ironic][ops] Taking ironic nodes out of production

On Tue, May 21, 2019 at 9:34 AM Christopher Price <christopher.price@est.tech> wrote:

...

I would add that something as simple as an operator policy could/should be able to remove hardware from an operational domain. It does not specifically need to be a fault or retirement, it may be as simple as repurposing to a different operational domain. From an OpenStack perspective this should not require any special handling from "retirement", it's just to know that there may be time constraints implied in a policy change that could potentially be ignored in a "retirement scenario".

Further, at least in my imagination, one might be reallocating hardware from one Ironic domain to another which may have implications on how we best bring a new node online. (or not, I'm no expert) </ end dubious thought stream>

You raise a really good point and we've had some past discussions from a standpoint of leasing hardware between clusters. One was ultimately to allow for a federated model where ironic could talk to ironic, however... that wasn't a very well received idea because it would mean ironic could become aware of other ironics... And soon ironic takes over the rest of the world.

...

/ Chris

[trim]

2165

Age (days ago)

2246

Last active (days ago)

List overview

Download

7 comments

5 participants

participants (5)

Arkady.Kanevsky＠dell.com
Arne Wiebalck
Bogdan Dobrelya
Christopher Price
Julia Kreger

[ironic][ops] Taking ironic nodes out of production

tags

participants (5)