[openstack-dev] [heat] health maintenance in autoscaling groups

Mike Spreitzer mspreitz at us.ibm.com
Fri Jul 18 16:12:21 UTC 2014


Thomas Herve <thomas.herve at enovance.com> wrote on 07/17/2014 02:06:13 AM:

> There are 4 resources related to neutron load balancing. 
> OS::Neutron::LoadBalancer is probably the least useful and the one 
> you can *not* use, as it's only there for compatibility with 
> AWS::AutoScaling::AutoScalingGroup. OS::Neutron::HealthMonitor does 
> the health checking part, although maybe not in the way you want it.

OK, let's work with these.  My current view is this: supposing the 
Convergence work delivers monitoring of health according to a member's 
status in its service and reacts accordingly, the gaps (compared to AWS 
functionality) are the abilities to (1) get member health from 
"application level pings" (e.g., URL polling) and (2) accept member health 
declarations from an external system, with consistent reaction to health 
information from all sources.

Source (1) is what an OS::Neutron::HealthMonitor specifies, and an 
OS::Neutron::Pool is the thing that takes such a spec.  So we could 
complete the (1) part if there were a way to tell a scaling group to poll 
the member health information developed by an OS::Neutron::Pool.  Does 
that look like the right approach?

For (2), this would amount to having an API that an external system (with 
proper authorization) can use to declare member health.  In the grand and 
glorious future when scaling groups have true APIs rather than being Heat 
hacks, such a thing would be part of those APIs.  In the immediate future 
we could simply add this to the Heat API.  Such an operation would take 
somethings like a stack name or UUID, the name or UUID of a resource that 
is a scaling group, and the member name or UUID of the Resource whose 
health is being declared, and "health_status=unhealthy".  Does that look 
about right?

For both of these new sources, the remaining question is how to get the 
right reaction.  In the case that the member is actually deleted already, 
life is easy.  Let's talk about the other cases.  Note that AWS admits 
that there might be false detection of unhealth as a member's contents 
finish getting into regular operation; AWS handles this by saying that the 
right reaction is to react only after unhealth has been consistently 
detected for a configured amount of time.  The simplest thing for a 
scaling group to do might be to include that hysteresis and eventually 
effect removal of a member by generating a new template that excludes the 
to-be-deleted member and doing an UPDATE on itself (qua stack) with that 
new template.  Does that look about right?

Thanks,
Mike

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20140718/2a974120/attachment.html>


More information about the OpenStack-dev mailing list