[openstack-dev] [Ironic] [TripleO] How to gracefully quiesce a box?

Jay Pipes jaypipes at gmail.com
Fri Aug 22 18:16:05 UTC 2014


On 08/22/2014 01:48 PM, Clint Byrum wrote:
> It has been brought to my attention that Ironic uses the biggest hammer
> in the IPMI toolbox to control chassis power:
>
> https://git.openstack.org/cgit/openstack/ironic/tree/ironic/drivers/modules/ipminative.py#n142
>
> Which is
>
>      ret = ipmicmd.set_power('off', wait)
>
> This is the most abrupt form, where the system power should be flipped
> off at a hardware level. The "short press" on the power button would be
> 'shutdown' instead of 'off'.
>
> I also understand that this has been brought up before, and that the
> answer given was "SSH in and shut it down yourself." I can respect that
> position, but I have run into a bit of a pickle using it. Observe:
>
> - ssh box.ip "poweroff"
> - poll ironic until power state is off.
>    - This is a race. Ironic is asserting the power. As soon as it sees
>      that the power is off, it will turn it back on.
>
> - ssh box.ip "halt"
>    - NO way to know that this has worked. Once SSH is off and the network
>      stack is gone, I cannot actually verify that the disks were
>      unmounted properly, which is the primary area of concern that I
>      have.
>
> This is particulary important if I'm issuing a rebuild + preserve
> ephemeral, as it is likely I will have lots of I/O going on, and I want
> to make sure that it is all quiesced before I reboot to replace the
> software and reboot.
>
> Perhaps I missed something. If so, please do educate me on how I can
> achieve this without hacking around it. Currently my workaround is to
> manually unmount the state partition, which is something system shutdown
> is supposed to do and may become problematic if system processes are
> holding it open.
>
> It seems to me that Ironic should at least try to use the graceful
> shutdown. There can be a timeout, but it would need to be something a user
> can disable so if graceful never works we never just dump the power on the
> box. Even a journaled filesystem will take quite a bit to do a full fsck.
>
> The inability to gracefully shutdown in a reasonable amount of time
> is an error state really, and I need to go to the box and inspect it,
> which is precisely the reason we have ERROR states.

What about placing a runlevel script in /etc/init.d/ and symlinking it 
to run on shutdown -- i.e. /etc/rc0.d/? You could run fsync or unmount 
the state partition in that script which would ensure disk state was 
quiesced, no?

Best,
-jay



More information about the OpenStack-dev mailing list