[openstack-dev] [Ironic] [TripleO] How to gracefully quiesce a box?
Robert Collins
robertc at robertcollins.net
Mon Aug 25 07:55:26 UTC 2014
This patch https://review.openstack.org/#/c/116093/3/ironic/nova/virt/ironic/driver.py
seems to have the right parameters to enable Ironic to DTRT (with
associated internal changes) - thats when Nova learnt to soft shutdown
machines.
-Rob
On 23 August 2014 05:48, Clint Byrum <clint at fewbar.com> wrote:
> It has been brought to my attention that Ironic uses the biggest hammer
> in the IPMI toolbox to control chassis power:
>
> https://git.openstack.org/cgit/openstack/ironic/tree/ironic/drivers/modules/ipminative.py#n142
>
> Which is
>
> ret = ipmicmd.set_power('off', wait)
>
> This is the most abrupt form, where the system power should be flipped
> off at a hardware level. The "short press" on the power button would be
> 'shutdown' instead of 'off'.
>
> I also understand that this has been brought up before, and that the
> answer given was "SSH in and shut it down yourself." I can respect that
> position, but I have run into a bit of a pickle using it. Observe:
>
> - ssh box.ip "poweroff"
> - poll ironic until power state is off.
> - This is a race. Ironic is asserting the power. As soon as it sees
> that the power is off, it will turn it back on.
>
> - ssh box.ip "halt"
> - NO way to know that this has worked. Once SSH is off and the network
> stack is gone, I cannot actually verify that the disks were
> unmounted properly, which is the primary area of concern that I
> have.
>
> This is particulary important if I'm issuing a rebuild + preserve
> ephemeral, as it is likely I will have lots of I/O going on, and I want
> to make sure that it is all quiesced before I reboot to replace the
> software and reboot.
>
> Perhaps I missed something. If so, please do educate me on how I can
> achieve this without hacking around it. Currently my workaround is to
> manually unmount the state partition, which is something system shutdown
> is supposed to do and may become problematic if system processes are
> holding it open.
>
> It seems to me that Ironic should at least try to use the graceful
> shutdown. There can be a timeout, but it would need to be something a user
> can disable so if graceful never works we never just dump the power on the
> box. Even a journaled filesystem will take quite a bit to do a full fsck.
>
> The inability to gracefully shutdown in a reasonable amount of time
> is an error state really, and I need to go to the box and inspect it,
> which is precisely the reason we have ERROR states.
>
> Thanks for your time. :)
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
--
Robert Collins <rbtcollins at hp.com>
Distinguished Technologist
HP Converged Cloud
More information about the OpenStack-dev
mailing list