[openstack-dev] [heat] Confused about the future of health maintenance and OS::Heat::HARestarter

Mike Spreitzer mspreitz at us.ibm.com
Wed Sep 17 13:57:24 UTC 2014


Background: Health maintenance is very important to users, and I have 
users who want to do it now and into the future.  Today a Heat user can 
write a template that maintains the health of a resource R.  The detection 
of a health problem can be done by anything that hits a webhook.  That 
generality is important; it is not sufficient to determine health by 
looking at what physical and/or virtual resources exist, it is also highly 
desirable to test whether these things are functioning well (e.g., the URL 
based health checking possible through an OS::Neutron::Pool; e.g., the 
user has his own external system that detects health problems).  The 
webhook is provided by an OS::Heat::HARestarter (note the name bug: such a 
thing does not restart anything, rather it deletes and re-creates a given 
resource and all its dependents) that deletes and re-creates R and its 
health detection/recovery wiring.  For a more specific example, consider 
the case of detection using the services of an OS::Neutron::Pool.  Note 
that it is not even necessary for there to be workload traffic through the 
associated OS::Neutron::LoadBalancer; all we are using here is the 
monitoring prescribed by the Pool's OS::Neutron::HealthMonitor.  The 
user's template has, in addition to R, three things: (1) an 
OS::Neutron::PoolMember that puts R in the Pool, (2) an 
OS::Heat::HARestarter that deletes and re-creates R and all its 
dependents, and (3) a Ceilometer alarm that detects when Neutron is 
reporting that the PoolMember is unhealthy and responds by hitting the 
HARestarter's webhook.  Note that all three of those are dependent on R, 
and thus are deleted and re-created when the HARestarter's webhook is hit; 
this avoids most of the noted issues with HARestarter.  R can be a stack 
that includes both a Nova server and an OS::Neutron::Port, to work around 
a Nova bug with implicit ports.

There is a movement afoot to remove HARestarter.  My concern is what can 
users do, now and into the future.  The first and most basic issue is 
this: at every step in the roadmap, it must be possible for users to 
accomplish health maintenance.  The second issue is easing the impact on 
what users write.  It would be pretty bad if the roadmap looks like this: 
before point X, users can only accomplish health maintenance as I outlined 
above, and from point X onward the user has to do something different. 
That is, there should be a transition period during which users can do 
things either the old way or the new way.  It would be even better if we, 
or a cloud provider, could provide an abstraction that will be usable 
throughout the roadmap (once that abstraction becomes available).  For 
example, if there were a resource type OS::Heat::ReliableAutoScalingGroup 
that adds health maintenance functionality (with detection by an 
OS::Neutron::Pool and exposure of per-member webhooks usable by anything) 
to OS::Heat::AutoScalingGroup.  Once some other way to do that maintenance 
becomes available, the implementation of 
OS::Heat::ReliableAutoScalingGroup could switch to that without requiring 
any changes to users' templates.  If at some point in the future 
OS::Heat::ReliableAutoScalingGroup becomes exactly equivalent to 
OS::Heat::AutoScalingGroup then we could deprecate 
OS::Heat::ReliableAutoScalingGroup and, at a later time, remove it.  Even 
better: since health maintenance is not logically connected to scaling 
group membership, make the abstraction be simply OS::Heat::HealthyResource 
(i.e., it is about a single resource regardless of whether it is a member 
of a scaling group) rather than OS::Heat::ReliableAutoScalingGroup. 
Question: would that abstraction (including the higher level detection and 
exposure of re-creation webhook) be implementable (or a no-op) in the 
planned future?

To aid in understanding: while it may be distasteful for a resource like 
HARestarter to tweak its containing stack, the critical question is 
whether it will remain *possible* throughout a transition period.  Is 
there an issue with such hacks being *possible* throughout a reasonable 
transition period?

Thanks,
Mike
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20140917/bf43b60c/attachment.html>


More information about the OpenStack-dev mailing list