[openstack-dev] [Ironic] [TripleO] How to gracefully quiesce a box?
clint at fewbar.com
Fri Aug 22 18:34:54 UTC 2014
Excerpts from Jay Pipes's message of 2014-08-22 11:16:05 -0700:
> On 08/22/2014 01:48 PM, Clint Byrum wrote:
> > It has been brought to my attention that Ironic uses the biggest hammer
> > in the IPMI toolbox to control chassis power:
> > https://git.openstack.org/cgit/openstack/ironic/tree/ironic/drivers/modules/ipminative.py#n142
> > Which is
> > ret = ipmicmd.set_power('off', wait)
> > This is the most abrupt form, where the system power should be flipped
> > off at a hardware level. The "short press" on the power button would be
> > 'shutdown' instead of 'off'.
> > I also understand that this has been brought up before, and that the
> > answer given was "SSH in and shut it down yourself." I can respect that
> > position, but I have run into a bit of a pickle using it. Observe:
> > - ssh box.ip "poweroff"
> > - poll ironic until power state is off.
> > - This is a race. Ironic is asserting the power. As soon as it sees
> > that the power is off, it will turn it back on.
> > - ssh box.ip "halt"
> > - NO way to know that this has worked. Once SSH is off and the network
> > stack is gone, I cannot actually verify that the disks were
> > unmounted properly, which is the primary area of concern that I
> > have.
> > This is particulary important if I'm issuing a rebuild + preserve
> > ephemeral, as it is likely I will have lots of I/O going on, and I want
> > to make sure that it is all quiesced before I reboot to replace the
> > software and reboot.
> > Perhaps I missed something. If so, please do educate me on how I can
> > achieve this without hacking around it. Currently my workaround is to
> > manually unmount the state partition, which is something system shutdown
> > is supposed to do and may become problematic if system processes are
> > holding it open.
> > It seems to me that Ironic should at least try to use the graceful
> > shutdown. There can be a timeout, but it would need to be something a user
> > can disable so if graceful never works we never just dump the power on the
> > box. Even a journaled filesystem will take quite a bit to do a full fsck.
> > The inability to gracefully shutdown in a reasonable amount of time
> > is an error state really, and I need to go to the box and inspect it,
> > which is precisely the reason we have ERROR states.
> What about placing a runlevel script in /etc/init.d/ and symlinking it
> to run on shutdown -- i.e. /etc/rc0.d/? You could run fsync or unmount
> the state partition in that script which would ensure disk state was
> quiesced, no?
That's already what OS's do in their rc0.d.
My point is, I don't have any way to know that process happened, without
the box turning itself off after it succeeded.
More information about the OpenStack-dev