[openstack-dev] [heat] Confused about the future of health maintenance and OS::Heat::HARestarter

Angus Salkeld asalkeld at mirantis.com
Wed Sep 17 23:55:17 UTC 2014


On Wed, Sep 17, 2014 at 11:57 PM, Mike Spreitzer <mspreitz at us.ibm.com>
wrote:

> Background: Health maintenance is very important to users, and I have
> users who want to do it now and into the future.  Today a Heat user can
> write a template that maintains the health of a resource R.  The detection
> of a health problem can be done by anything that hits a webhook.  That
> generality is important; it is not sufficient to determine health by
> looking at what physical and/or virtual resources exist, it is also highly
> desirable to test whether these things are functioning well (e.g., the URL
> based health checking possible through an OS::Neutron::Pool; e.g., the user
> has his own external system that detects health problems).  The webhook is
> provided by an OS::Heat::HARestarter (note the name bug: such a thing does
> not restart anything, rather it deletes and re-creates a given resource and
> all its dependents) that deletes and re-creates R and its health
> detection/recovery wiring.  For a more specific example, consider the case
> of detection using the services of an OS::Neutron::Pool.  Note that it is
> not even necessary for there to be workload traffic through the associated
> OS::Neutron::LoadBalancer; all we are using here is the monitoring
> prescribed by the Pool's OS::Neutron::HealthMonitor.  The user's template
> has, in addition to R, three things: (1) an OS::Neutron::PoolMember that
> puts R in the Pool, (2) an OS::Heat::HARestarter that deletes and
> re-creates R and all its dependents, and (3) a Ceilometer alarm that
> detects when Neutron is reporting that the PoolMember is unhealthy and
> responds by hitting the HARestarter's webhook.  Note that all three of
> those are dependent on R, and thus are deleted and re-created when the
> HARestarter's webhook is hit; this avoids most of the noted issues with
> HARestarter.  R can be a stack that includes both a Nova server and an
> OS::Neutron::Port, to work around a Nova bug with implicit ports.
>
> There is a movement afoot to remove HARestarter.  My concern is what can
> users do, now and into the future.  The first and most basic issue is this:
> at every step in the roadmap, it must be possible for users to accomplish
> health maintenance.  The second issue is easing the impact on what users
> write.  It would be pretty bad if the roadmap looks like this: before point
> X, users can only accomplish health maintenance as I outlined above, and
> from point X onward the user has to do something different.  That is, there
> should be a transition period during which users can do things either the
> old way or the new way.  It would be even better if we, or a cloud
> provider, could provide an abstraction that will be usable throughout the
> roadmap (once that abstraction becomes available).  For example, if there
> were a resource type OS::Heat::ReliableAutoScalingGroup that adds health
> maintenance functionality (with detection by an OS::Neutron::Pool and
> exposure of per-member webhooks usable by anything) to
> OS::Heat::AutoScalingGroup.  Once some other way to do that maintenance
> becomes available, the implementation of OS::Heat::ReliableAutoScalingGroup
> could switch to that without requiring any changes to users' templates.  If
> at some point in the future OS::Heat::ReliableAutoScalingGroup becomes
> exactly equivalent to OS::Heat::AutoScalingGroup then we could deprecate
> OS::Heat::ReliableAutoScalingGroup and, at a later time, remove it.  Even
> better: since health maintenance is not logically connected to scaling
> group membership, make the abstraction be simply OS::Heat::HealthyResource
> (i.e., it is about a single resource regardless of whether it is a member
> of a scaling group) rather than OS::Heat::ReliableAutoScalingGroup.
>  Question: would that abstraction (including the higher level detection and
> exposure of re-creation webhook) be implementable (or a no-op) in the
> planned future?
>
> To aid in understanding: while it may be distasteful for a resource like
> HARestarter to tweak its containing stack, the critical question is whether
> it will remain *possible* throughout a transition period.  Is there an
> issue with such hacks being *possible* throughout a reasonable transition
> period?
>
>
Mike I don't think we want to rashly rip this resource out, we just want to
make it clear to user's what it's current limitations are.
Guest HA is an important feature that we really need to work more on so
that this feature works much better.
Once we have the convergence observer everything should have "HA" as long
as there is a notification to say the condition of the
resource has changed. So once that (observer) is in place we will just need
to spend more time on monitoring agents/services.

-Angus

Thanks,
> Mike
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20140918/658b368e/attachment.html>


More information about the OpenStack-dev mailing list