I am testing Masakari host failure recovery on OpenStack 2023.1 (Antelope) deployed using Kolla-Ansible.
Environment:
3 nodes
Each node runs controller + compute
Pacemaker + Corosync deployed on all 3 nodes
Masakari enabled (hostmonitor, instancemonitor, engine, api)
pacemaker-remote not used (all nodes are full Pacemaker members)
Masakari segment:
service_type = COMPUTE
recovery_method = auto
Tests performed:
Process-level failures (qemu, libvirt) → detected correctly
nova-compute stop → detected (no evacuation, expected)
Host failure test:
Powered off one compute/controller node
Also tested systemctl stop corosync
Issue:
Pacemaker shows the node as OFFLINE
However:
No Masakari HOST_DOWN notification is generated
No evacuation is triggered
masakari-hostmonitor logs do not show host failure handling
Are there any additional requirements needed for Masakari host failure detection in this topology?
Any guidance or references would be greatly appreciated.