<div dir="ltr"><div><div><div><div><div><div><div>Take this with a grain of salt because we're using the original version before the project moved under the Big Tent and I'm not sure how much it's evolved since then. I assume the basic functions are the same though.<br><br></div>You're correct; Corosync and Pacemaker are used to determine if a compute node goes down. The masakari-host-monitor process runs on each compute node and checks the cluster status and sends a notification to masakari-controller when a node goes down. The controller process keeps a list of reserved hosts in it's database and calls nova host-evacuate to move the Instances to one of the reserved hosts.<br><br></div><div>In our environment I also configured STONITH and I'd highly recommend it. With STONITH Pacemaker sends a shutdown command to the Out of Band Management card of the unreachable node to make sure that it can't come back and cause a conflict.<br><br></div><div>There are two other components, masakari-process-monitor and masakari-instance-monitor. These also run on your compute nodes. The former watches the nova-compute service and the later monitors running instances and restarts them if necessary.<br><br></div><div>Looking here it seems they've split Masakari into thee different repos: <a href="https://github.com/openstack?utf8=%E2%9C%93&q=masakari&type=&language=" target="_blank">https://github.com/openstack?<wbr>utf8=%E2%9C%93&q=masakari&<wbr>type=&language=</a><br><br></div><div>masakari - The controller service and API<br></div><div>masakari-monitors - Compute node monitoring services<br></div><div>python-masakari-client - The cli tools<br></div><div><br><br></div></div></div></div></div></div></div>