[Edge-computing] [ironic][ops] Taking ironic nodes out of production
Arkady.Kanevsky at dell.com
Arkady.Kanevsky at dell.com
Tue May 21 19:00:41 UTC 2019
Inline response
-----Original Message-----
From: Julia Kreger <juliaashleykreger at gmail.com>
Sent: Tuesday, May 21, 2019 12:33 PM
To: Kanevsky, Arkady
Cc: Christopher Price; Bogdan Dobrelya; openstack-discuss; edge-computing at lists.openstack.org
Subject: Re: [Edge-computing] [ironic][ops] Taking ironic nodes out of production
[EXTERNAL EMAIL]
On Tue, May 21, 2019 at 5:55 AM <Arkady.Kanevsky at dell.com> wrote:
>
> Let's dig deeper into requirements.
> I see three distinct use cases:
> 1. put node into maintenance mode. Say to upgrade FW/BIOS or any other life-cycle event. It stays in ironic cluster but it is no longer in use by the rest of openstack, like Nova.
> 2. Put node into "fail" state. That is remove from usage, remove from Ironic cluster. What cleanup, operator would like/can do is subject to failure. Depending on the node type it may need to be "replaced".
Or troubleshooted by a human, and could be returned to a non-failure state. I think largely the only way we as developers could support that is allow for hook scripts to be called upon entering/exiting such a state. That being said, At least from what Beth was saying at the PTG, this seems to be one of the most important states.
> 3. Put node into "available" to other usage. What cleanup operator wants to do will need to be defined. This is very similar step as used for Baremetal as a Service as node is reassigned back into available pool. Depending on the next usage of a node it may stay in the Ironic cluster or may be removed from it. Once removed it can be "retired" or used for any other purpose.
Do you mean "unprovision" a node and move it through cleaning? I'm not sure I understand what your trying to get across. There is a case where a node would have been moved to a "failed" state, and could be "unprovisioned". If we reach the point where we are able to unprovision, it seems like we might be able to re-deploy, so maybe the option is to automatically move to state which is kind of like bucket for broken nodes?
AK: Before node is removed from Ironic some level of cleanup is expected. Especially if node is to be reused as Chris stated.
I assume that that cleanup will be done by Ironic.
What you do with the node after it is outside of Ironic is out of scope.
>
> Thanks,
> Arkady
>
> -----Original Message-----
> From: Christopher Price <christopher.price at est.tech>
> Sent: Tuesday, May 21, 2019 3:26 AM
> To: Bogdan Dobrelya; openstack-discuss at lists.openstack.org;
> edge-computing at lists.openstack.org
> Subject: Re: [Edge-computing] [ironic][ops] Taking ironic nodes out of
> production
>
>
> [EXTERNAL EMAIL]
>
> I would add that something as simple as an operator policy could/should be able to remove hardware from an operational domain. It does not specifically need to be a fault or retirement, it may be as simple as repurposing to a different operational domain. From an OpenStack perspective this should not require any special handling from "retirement", it's just to know that there may be time constraints implied in a policy change that could potentially be ignored in a "retirement scenario".
>
> Further, at least in my imagination, one might be reallocating
> hardware from one Ironic domain to another which may have implications
> on how we best bring a new node online. (or not, I'm no expert) </
> end dubious thought stream>
>
> / Chris
>
> On 2019-05-21, 09:16, "Bogdan Dobrelya" <bdobreli at redhat.com> wrote:
>
> [CC'ed edge-computing at lists.openstack.org]
>
> On 20.05.2019 18:33, Arne Wiebalck wrote:
> > Dear all,
> >
> > One of the discussions at the PTG in Denver raised the need for
> > a mechanism to take ironic nodes out of production (a task for
> > which the currently available 'maintenance' flag does not seem
> > appropriate [1]).
> >
> > The use case there is an unhealthy physical node in state 'active',
> > i.e. associated with an instance. The request is then to enable an
> > admin to mark such a node as 'faulty' or 'in quarantine' with the
> > aim of not returning the node to the pool of available nodes once
> > the hosted instance is deleted.
> >
> > A very similar use case which came up independently is node
> > retirement: it should be possible to mark nodes ('active' or not)
> > as being 'up for retirement' to prepare the eventual removal from
> > ironic. As in the example above, ('active') nodes marked this way
> > should not become eligible for instance scheduling again, but
> > automatic cleaning, for instance, should still be possible.
> >
> > In an effort to cover these use cases by a more general
> > "quarantine/retirement" feature:
> >
> > - are there additional use cases which could profit from such a
> > "take a node out of service" mechanism?
>
> There are security related examples described in the Edge Security
> Challenges whitepaper [0] drafted by k8s IoT SIG [1], like in the
> chapter 2 Trusting hardware, whereby "GPS coordinate changes can be used
> to force a shutdown of an edge node". So a node may be taken out of
> service as an indicator of a particular condition of edge hardware.
>
> [0]
> https://docs.google.com/document/d/1iSIk8ERcheehk0aRG92dfOvW5NjkdedN8F7mSUTr-r0/edit#heading=h.xf8mdv7zexgq
> [1]
> https://github.com/kubernetes/community/tree/master/wg-iot-edge
>
> >
> > - would these use cases put additional constraints on how the
> > feature should look like (e.g.: "should not prevent cleaning")
> >
> > - are there other characteristics such a feature should have
> > (e.g.: "finding these nodes should be supported by the cli")
> >
> > Let me know if you have any thoughts on this.
> >
> > Cheers,
> > Arne
> >
> >
> > [1] https://etherpad.openstack.org/p/DEN-train-ironic-ptg, l. 360
> >
>
>
> --
> Best regards,
> Bogdan Dobrelya,
> Irc #bogdando
>
> _______________________________________________
> Edge-computing mailing list
> Edge-computing at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/edge-computing
>
>
> _______________________________________________
> Edge-computing mailing list
> Edge-computing at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/edge-computing
More information about the openstack-discuss
mailing list