[openstack-dev] [heat] health maintenance in autoscaling groups

Mike Spreitzer mspreitz at us.ibm.com
Tue Jul 1 19:47:03 UTC 2014

In AWS, an autoscaling group includes health maintenance functionality --- 
both an ability to detect basic forms of failures and an ability to react 
properly to failures detected by itself or by a load balancer.  What is 
the thinking about how to get this functionality in OpenStack?  Since 
OpenStack's OS::Heat::AutoScalingGroup has a more general member type, 
what is the thinking about what failure detection means (and how it would 
be accomplished, communicated)?

I have not found design discussion of this; have I missed something?

I suppose the natural answer for OpenStack would be centered around 
webhooks.  An OpenStack scaling group (OS SG = OS::Heat::AutoScalingGroup 
or AWS::AutoScaling::AutoScalingGroup or OS::Heat::ResourceGroup or 
OS::Heat::InstanceGroup) could generate a webhook per member, with the 
meaning of the webhook being that the member has been detected as dead and 
should be deleted and removed from the group --- and a replacement member 
created if needed to respect the group's minimum size.  When the member is 
a Compute instance and Ceilometer exists, the OS SG could define a 
Ceilometer alarm for each member (by including these alarms in the 
template generated for the nested stack that is the SG), programmed to hit 
the member's deletion webhook when death is detected (I imagine there are 
a few ways to write a Ceilometer condition that detects instance death). 
When the member is a nested stack and Ceilometer exists, it could be the 
member stack's responsibility to include a Ceilometer alarm that detects 
the member stack's death and hit the member stack's deletion webhook. 
There is a small matter of how the author of the template used to create 
the member stack writes some template snippet that creates a Ceilometer 
alarm that is specific to a member stack that does not exist yet.  I 
suppose we could stipulate that if the member template includes a 
parameter with name "member_name" and type "string" then the OS OG takes 
care of supplying the correct value of that parameter; as illustrated in 
the asg_of_stacks.yaml of https://review.openstack.org/#/c/97366/ , a 
member template can use a template parameter to tag Ceilometer data for 
querying.  The URL of the member stack's deletion webhook could be passed 
to the member template via the same sort of convention.  When Ceilometer 
does not exist, it is less obvious to me what could usefully be done.  Are 
there any useful SG member types besides Compute instances and nested 
stacks?  Note that a nested stack could also pass its member deletion 
webhook to a load balancer (that is willing to accept such a thing, of 
course), so we get a lot of unity of mechanism between the case of 
detection by infrastructure vs. application level detection.

I am not entirely happy with the idea of a webhook per member.  If I 
understand correctly, generating webhooks is a somewhat expensive and 
problematic process.  What would be the alternative?

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20140701/dd447e0e/attachment.html>

More information about the OpenStack-dev mailing list