Hi all The HA session was really well attended and I'd like to give some feedback from the session. Firstly there is some really good content here: https://etherpad.openstack.org/p/kilo-crossproject-ha-integration 1. We SHOULD provide better health checks for OCF resources ( http://linux-ha.org/wiki/OCF_Resource_Agents). These should be fast and reliable. We should probably bike shed on some convention like "<project>-manage healthcheck" and then roll this out for each project. 2. We should really move https://github.com/madkiss/openstack-resource-agents to stackforge or openstack if the author is agreeable to it (it's referred to in our official docs). 3. All services SHOULD support Active/Active configurations (better scaling and it's always tested) 4. We should be testing HA (there are a number of ideas on the etherpad about this) 5. Many services do not recovery in the case of failure mid-task This seems like a big problem to me (some leave the DB in a mess). Someone linked to an interesting article ( crash-only-software: http://lwn.net/Articles/191059/) <http://lwn.net/Articles/191059/> that suggests that we if we do this correctly we should not need the concept of clean shutdown. ( https://github.com/openstack/oslo-incubator/blob/master/openstack/common/service.py#L459-L471 ) I'd be interested in how people think this needs to be approached (just raise bugs for each?). Regards Angus -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20141111/1eff41f5/attachment-0001.html>