[openstack-dev] [heat] health maintenance in autoscaling groups
Mike Spreitzer
mspreitz at us.ibm.com
Fri Jul 18 16:12:21 UTC 2014
Thomas Herve <thomas.herve at enovance.com> wrote on 07/17/2014 02:06:13 AM:
> There are 4 resources related to neutron load balancing.
> OS::Neutron::LoadBalancer is probably the least useful and the one
> you can *not* use, as it's only there for compatibility with
> AWS::AutoScaling::AutoScalingGroup. OS::Neutron::HealthMonitor does
> the health checking part, although maybe not in the way you want it.
OK, let's work with these. My current view is this: supposing the
Convergence work delivers monitoring of health according to a member's
status in its service and reacts accordingly, the gaps (compared to AWS
functionality) are the abilities to (1) get member health from
"application level pings" (e.g., URL polling) and (2) accept member health
declarations from an external system, with consistent reaction to health
information from all sources.
Source (1) is what an OS::Neutron::HealthMonitor specifies, and an
OS::Neutron::Pool is the thing that takes such a spec. So we could
complete the (1) part if there were a way to tell a scaling group to poll
the member health information developed by an OS::Neutron::Pool. Does
that look like the right approach?
For (2), this would amount to having an API that an external system (with
proper authorization) can use to declare member health. In the grand and
glorious future when scaling groups have true APIs rather than being Heat
hacks, such a thing would be part of those APIs. In the immediate future
we could simply add this to the Heat API. Such an operation would take
somethings like a stack name or UUID, the name or UUID of a resource that
is a scaling group, and the member name or UUID of the Resource whose
health is being declared, and "health_status=unhealthy". Does that look
about right?
For both of these new sources, the remaining question is how to get the
right reaction. In the case that the member is actually deleted already,
life is easy. Let's talk about the other cases. Note that AWS admits
that there might be false detection of unhealth as a member's contents
finish getting into regular operation; AWS handles this by saying that the
right reaction is to react only after unhealth has been consistently
detected for a configured amount of time. The simplest thing for a
scaling group to do might be to include that hysteresis and eventually
effect removal of a member by generating a new template that excludes the
to-be-deleted member and doing an UPDATE on itself (qua stack) with that
new template. Does that look about right?
Thanks,
Mike
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20140718/2a974120/attachment.html>
More information about the OpenStack-dev
mailing list