[openstack-dev] [ironic] Tooling for recovering nodes
jay at jvf.cc
Wed Jun 1 23:36:24 UTC 2016
Hey Tan, some comments inline.
On 5/31/16 1:25 AM, Tan, Lin wrote:
> Recently, I am working on a spec in order to recover nodes which get stuck in deploying state, so I really expect some feedback from you guys.
> Ironic nodes can be stuck in deploying/deploywait/cleaning/cleanwait/inspecting/deleting if the node is reserved by a dead conductor (the exclusive lock was not released).
> Any further requests will be denied by ironic because it thinks the node resource is under control of another conductor.
> To be more clear, let's narrow the scope and focus on the deploying state first. Currently, people do have several choices to clear the reserved lock:
> 1. restart the dead conductor
> 2. wait up to 2 or 3 minutes and _check_deploying_states() will clear the lock.
> 3. The operator touches the DB to manually recover these nodes.
I actually like option #3 being optionally integrated into a tool to
clear nodes stuck in *ing state. If specified, it would clear the lock
on the deploy as it moved it from DEPLOYING -> DEPLOYFAILED. Obviously,
for cleaning this could be dangerous, and should be documented as so --
imagine clearing a lock mid-firmware flash and having a power action
taken to brick the node.
Given this is tooling intended to handle many cases, I think it's better
to give the operator the choice to take more dramatic action if they wish.
> Option two looks very promising but there are some weakness:
> 2.1 It won't work if the dead conductor was renamed or deleted.
> 2.2 It won't work if the node's specific driver was not enabled on live conductors.
> 2.3 It won't work if the node is in maintenance. (only a corner case).
> Definitely we should improve the option 2, but there are could be more issues I didn't know in a more complicated environment.
> So my question is do we still need a new command to recover these node easier without accessing DB, like this PoC :
> ironic-noderecover --node_uuids=UUID1,UUID2 --config-file=/etc/ironic/ironic.conf
> Best Regards,
>  https://review.openstack.org/#/c/319812
>  https://review.openstack.org/#/c/311273/
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
More information about the OpenStack-dev