[openstack-dev] [Fuel] HA cluster disk monitoring, failover and recovery

Alex Schultz aschultz at mirantis.com
Tue Nov 17 14:41:32 UTC 2015


Hey Kyrylo,


On Tue, Nov 17, 2015 at 8:28 AM, Kyrylo Galanov <kgalanov at mirantis.com> wrote:
> Hi Team,
>
> I have been testing fail-over after free disk space is less than 512 mb.
> (https://review.openstack.org/#/c/240951/)
> Affected node is stopped correctly and services migrate to a healthy node.
>
> However, after free disk space is more than 512 mb again the node does not
> recover it's state to operating. Moreover, starting the resources manually
> would rather fail. In a nutshell, the pacemaker service / node should be
> restarted. Detailed information is available here:
> https://www.suse.com/documentation/sle_ha/book_sleha/data/sec_ha_configuration_basics_monitor_health.html
>
> How do we address this issue?
>

So the original change for this was
https://review.openstack.org/#/c/226062/. As indicated by the commit
message, the only way pacemaker will recover is that the operator must
run a pacemaker command to clear the disk alert.

crm node status-attr <hostname> delete "#health_disk"

Once the operator has cleared up the diskspace issue and run the above
command, pacemaker will rejoin the cluster and start services again.
The documentation bug for this is
https://bugs.launchpad.net/fuel/+bug/1500422.

Thanks,
-Alex

>
> Best regards,
> Kyrylo
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>



More information about the OpenStack-dev mailing list