[openstack-dev] [heat] health maintenance in autoscaling groups
mspreitz at us.ibm.com
Tue Jul 1 19:47:03 UTC 2014
In AWS, an autoscaling group includes health maintenance functionality ---
both an ability to detect basic forms of failures and an ability to react
properly to failures detected by itself or by a load balancer. What is
the thinking about how to get this functionality in OpenStack? Since
OpenStack's OS::Heat::AutoScalingGroup has a more general member type,
what is the thinking about what failure detection means (and how it would
be accomplished, communicated)?
I have not found design discussion of this; have I missed something?
I suppose the natural answer for OpenStack would be centered around
webhooks. An OpenStack scaling group (OS SG = OS::Heat::AutoScalingGroup
or AWS::AutoScaling::AutoScalingGroup or OS::Heat::ResourceGroup or
OS::Heat::InstanceGroup) could generate a webhook per member, with the
meaning of the webhook being that the member has been detected as dead and
should be deleted and removed from the group --- and a replacement member
created if needed to respect the group's minimum size. When the member is
a Compute instance and Ceilometer exists, the OS SG could define a
Ceilometer alarm for each member (by including these alarms in the
template generated for the nested stack that is the SG), programmed to hit
the member's deletion webhook when death is detected (I imagine there are
a few ways to write a Ceilometer condition that detects instance death).
When the member is a nested stack and Ceilometer exists, it could be the
member stack's responsibility to include a Ceilometer alarm that detects
the member stack's death and hit the member stack's deletion webhook.
There is a small matter of how the author of the template used to create
the member stack writes some template snippet that creates a Ceilometer
alarm that is specific to a member stack that does not exist yet. I
suppose we could stipulate that if the member template includes a
parameter with name "member_name" and type "string" then the OS OG takes
care of supplying the correct value of that parameter; as illustrated in
the asg_of_stacks.yaml of https://review.openstack.org/#/c/97366/ , a
member template can use a template parameter to tag Ceilometer data for
querying. The URL of the member stack's deletion webhook could be passed
to the member template via the same sort of convention. When Ceilometer
does not exist, it is less obvious to me what could usefully be done. Are
there any useful SG member types besides Compute instances and nested
stacks? Note that a nested stack could also pass its member deletion
webhook to a load balancer (that is willing to accept such a thing, of
course), so we get a lot of unity of mechanism between the case of
detection by infrastructure vs. application level detection.
I am not entirely happy with the idea of a webhook per member. If I
understand correctly, generating webhooks is a somewhat expensive and
problematic process. What would be the alternative?
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the OpenStack-dev