[openstack-dev] [ironic]Ironic operations on nodes in maintenance mode
Jim Rollenhagen
jim at jimrollenhagen.com
Tue Nov 24 15:39:02 UTC 2015
On Mon, Nov 23, 2015 at 03:35:58PM -0800, Shraddha Pandhe wrote:
> Hi,
>
> I would like to know how everyone is using maintenance mode and what is
> expected from admins about nodes in maintenance. The reason I am bringing
> up this topic is because, most of the ironic operations, including manual
> cleaning are not allowed for nodes in maintenance. Thats a problem for us.
>
> The way we use it is as follows:
>
> We allow users to put nodes in maintenance mode (indirectly) if they find
> something wrong with the node. They also provide a maintenance reason along
> with it, which gets stored as "user_reason" under maintenance_reason. So
> basically we tag it as user specified reason.
>
> To debug what happened to the node our operators use manual cleaning to
> re-image the node. By doing this, they can find out all the issues related
> to re-imaging (dhcp, ipmi, image transfer, etc). This debugging process
> applies to all the nodes that were put in maintenance either by user, or by
> system (due to power cycle failure or due to cleaning failure).
Interesting; do you let the node go through cleaning between the user
nuking the instance and doing this manual cleaning stuff?
At Rackspace, we leverage the fact that maintenance mode will not allow
the node to proceed through the state machine. If a user reports a
hardware issue, we maintenance the node on their behalf, and when they
delete it, it boots the agent for cleaning and begins heartbeating.
Heartbeats are ignored in maintenance mode, which gives us time to
investigate the hardware, fix things, etc. When the issue is resolved,
we remove maintenance mode, it goes through cleaning, then back in the
pool.
We used to enroll nodes in maintenance mode, back when the API put them
in the available state immediately, to avoid them being scheduled to
until we knew they were good to go. The enroll state solved this for us.
Last, we use maintenance mode on available nodes if we want to
temporarily pull them from the pool for a manual process or some
testing. This can also be solved by the manageable state.
// jim
More information about the OpenStack-dev
mailing list