<div dir="ltr"><div>Hi guys,</div><div><br></div><div>I want to continue this discussion to get some agreement on it.  There're three kind of HA solutions mentioned in this thread. </div><div><br></div><div>1.  Hypervisor level HA.  Based on the previous discussions,  it seems that pacemaker is the best candidate. But it still lacks of success story and document. I am wondering its response time on failure and if it could cause mistaken fencing.  I will investigate it and give you an update if I could make any progress.</div><div><br></div><div>2. Instance OS level HA.  libvirt watchdog can be used to detect instance OS failure (kernel panic or hang). As Daniel said, currently nova doesn't send out any notification when the watchdog is triggered. If it does, then ceilometer can raise an alarm when get this kind of message. Probably the instance could be recovered by pre-defined reboot action. I don't know if this proposal is welcomed by ceilometer guys. </div><div><br></div><div>According to <a href="http://www.vmware.com/files/pdf/VMware-High-Availability-DS-EN.pdf">http://www.vmware.com/files/pdf/VMware-High-Availability-DS-EN.pdf</a>,</div><div>the above two cases are what VMware HA provides. </div><div><br></div><div>3. Instance App level HA.  I am not talking about any special application HA solution, but a generic one, like keepalived.  It can provide more fine grained check/recovery than instance OS level.  The following article introduce how to do it: <a href="http://blog.aaronorosen.com/implementing-high-availability-instances-with-neutron-using-vrrp/">http://blog.aaronorosen.com/implementing-high-availability-instances-with-neutron-using-vrrp/</a>   We could use heat to orchestrate the resources as the article describes. But it can't work with neutron l2 population because neutron is not aware of the vip switching inside instance, and therefore the pre populated arp table got stale after switching. Maybe sending an gratuitous arp packet in keepalive script could workaround it.  An alternative is letting keepalive switch instance's floating ip address instead of virtual ip address.  The other potential issue in this solution is that openstack still lacks of multiple attached volume support. Like switching floating ip address,  the data volume can be detached and attached in the keepavlie script.  </div><div><br></div><div><br></div></div>