<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Wed, Sep 17, 2014 at 11:57 PM, Mike Spreitzer <span dir="ltr"><<a href="mailto:mspreitz@us.ibm.com" target="_blank">mspreitz@us.ibm.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><font face="sans-serif">Background: Health maintenance is very

important to users, and I have users who want to do it now and into the

future.  Today a Heat user can write a template that maintains the

health of a resource R.  The detection of a health problem can be

done by anything that hits a webhook.  That generality is important;

it is not sufficient to determine health by looking at what physical and/or

virtual resources exist, it is also highly desirable to test whether these

things are functioning well (e.g., the URL based health checking possible

through an OS::Neutron::Pool; e.g., the user has his own external system

that detects health problems).  The webhook is provided by an OS::Heat::HARestarter

(note the name bug: such a thing does not restart anything, rather it deletes

and re-creates a given resource and all its dependents) that deletes and

re-creates R and its health detection/recovery wiring.  For a more

specific example, consider the case of detection using the services of

an OS::Neutron::Pool.  Note that it is not even necessary for there

to be workload traffic through the associated OS::Neutron::LoadBalancer;

all we are using here is the monitoring prescribed by the Pool's OS::Neutron::HealthMonitor.

 The user's template has, in addition to R, three things: (1) an OS::Neutron::PoolMember

that puts R in the Pool, (2) an OS::Heat::HARestarter that deletes and

re-creates R and all its dependents, and (3) a Ceilometer alarm that detects

when Neutron is reporting that the PoolMember is unhealthy and responds

by hitting the HARestarter's webhook.  Note that all three of those

are dependent on R, and thus are deleted and re-created when the HARestarter's

webhook is hit; this avoids most of the noted issues with HARestarter.

 R can be a stack that includes both a Nova server and an OS::Neutron::Port,

to work around a Nova bug with implicit ports.</font>

<br>

<br><font face="sans-serif">There is a movement afoot to remove

HARestarter.  My concern is what can users do, now and into the future.

 The first and most basic issue is this: at every step in the roadmap,

it must be possible for users to accomplish health maintenance.  The

second issue is easing the impact on what users write.  It would be

pretty bad if the roadmap looks like this: before point X, users can only

accomplish health maintenance as I outlined above, and from point X onward

the user has to do something different.  That is, there should be

a transition period during which users can do things either the old way

or the new way.  It would be even better if we, or a cloud provider,

could provide an abstraction that will be usable throughout the roadmap

(once that abstraction becomes available).  For example, if there

were a resource type OS::Heat::ReliableAutoScalingGroup that adds health

maintenance functionality (with detection by an OS::Neutron::Pool and exposure

of per-member webhooks usable by anything) to OS::Heat::AutoScalingGroup.

 Once some other way to do that maintenance becomes available, the

implementation of OS::Heat::ReliableAutoScalingGroup could switch to that

without requiring any changes to users' templates.  If at some point

in the future OS::Heat::ReliableAutoScalingGroup becomes exactly equivalent

to OS::Heat::AutoScalingGroup then we could deprecate OS::Heat::ReliableAutoScalingGroup

and, at a later time, remove it.  Even better: since health maintenance

is not logically connected to scaling group membership, make the abstraction

be simply OS::Heat::HealthyResource (i.e., it is about a single resource

regardless of whether it is a member of a scaling group) rather than OS::Heat::ReliableAutoScalingGroup.

 Question: would that abstraction (including the higher level detection

and exposure of re-creation webhook) be implementable (or a no-op) in the

planned future?</font>

<br>

<br><font face="sans-serif">To aid in understanding: while it may

be distasteful for a resource like HARestarter to tweak its containing

stack, the critical question is whether it will remain *possible* throughout

a transition period.  Is there an issue with such hacks being *possible*

throughout a reasonable transition period?</font>

<br>

<br></blockquote><div><br></div><div>Mike I don't think we want to rashly rip this resource out, we just want to make it clear to user's what it's current limitations are.<br></div><div>Guest HA is an important feature that we really need to work more on so that this feature works much better.<br></div><div>Once we have the convergence observer everything should have "HA" as long as there is a notification to say the condition of the<br></div><div>resource has changed. So once that (observer) is in place we will just need to spend more time on monitoring agents/services.<br></div><div> <br></div><div>-Angus<br></div><div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><font face="sans-serif">Thanks,<br>

Mike</font><br>_______________________________________________<br>

OpenStack-dev mailing list<br>

<a href="mailto:OpenStack-dev@lists.openstack.org">OpenStack-dev@lists.openstack.org</a><br>

<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><br>

<br></blockquote></div><br></div></div>