[openstack-dev] [heat] health maintenance in autoscaling groups
Steven Hardy
shardy at redhat.com
Wed Jul 2 10:02:36 UTC 2014
On Wed, Jul 02, 2014 at 03:02:14PM +0800, Qiming Teng wrote:
> Just some random thoughts below ...
>
> On Tue, Jul 01, 2014 at 03:47:03PM -0400, Mike Spreitzer wrote:
> > In AWS, an autoscaling group includes health maintenance functionality ---
> > both an ability to detect basic forms of failures and an ability to react
> > properly to failures detected by itself or by a load balancer. What is
> > the thinking about how to get this functionality in OpenStack? Since
>
> We are prototyping a solution to this problem at IBM Research - China
> lab. The idea is to leverage oslo.messaging and ceilometer events for
> instance (possibly other resource such as port, securitygroup ...)
> failure detection and handling.
This sounds interesting, are you planning to propose a spec for heat
describing this work and submit your patches to heat?
>
> > OpenStack's OS::Heat::AutoScalingGroup has a more general member type,
> > what is the thinking about what failure detection means (and how it would
> > be accomplished, communicated)?
>
> When most OpenStack services are making use of oslo.notify, in theory, a
> service should be able to send/receive events related to resource
> status. In our current prototype, at least host failure (detected in
> Nova and reported with a patch), VM failure (detected by nova), and some
> lifecycle events of other resources can be detected and then collected
> by Ceilometer. There is certainly a possibility to listen to the
> message queue directly from Heat, but we only implemented the Ceilometer
> centric approach.
It has been pointed out a few times that in large deployments, different
services may not share the same message bus. So while *an* option could be
heat listenting to the message bus, I'd prefer that we maintain the alarm
notifications via the ReST API as the primary signalling mechanism.
> >
> > I have not found design discussion of this; have I missed something?
> >
> > I suppose the natural answer for OpenStack would be centered around
> > webhooks. An OpenStack scaling group (OS SG = OS::Heat::AutoScalingGroup
> > or AWS::AutoScaling::AutoScalingGroup or OS::Heat::ResourceGroup or
> > OS::Heat::InstanceGroup) could generate a webhook per member, with the
> > meaning of the webhook being that the member has been detected as dead and
> > should be deleted and removed from the group --- and a replacement member
> > created if needed to respect the group's minimum size.
>
> Well, I would suggest we generalize this into a event messaging or
> signaling solution, instead of just 'webhooks'. The reason is that
> webhooks as it is implemented today is not carrying a payload of useful
> information -- I'm referring to the alarms in Ceilometer.
The resource signal interface used by ceilometer can carry whatever data
you like, so the existing solution works fine, we don't need a new one IMO.
For example look at this patch which converts WaitConditions to use the
resource_signal interface:
https://review.openstack.org/#/c/101351/2/heat/engine/resources/wait_condition.py
We pass the data to the WaitCondition via a resource signal, the exact same
transport that is used for alarm notifications from ceilometer.
Note the "webhook" thing really just means a pre-signed request, which
using the v2 AWS style signed requests (currently the only option for heat
pre-signed requests) does not sign the request body.
This is a security disadvantage (addressed by the v3 AWS signing scheme),
but it does mean you can pass data via the pre-signed URL.
An alternative to pre-signed URLs is simply to make an authenticated call
to the native ReST API, but then whatever is signalling requires either
credentials, a token, or a trust to impersonate the stack owner. Again, you
can pass whatever data you want via this interface.
> There are other cases as well. A member failure could be caused by a
> temporary communication problem, which means it may show up quickly when
> a replacement member is already being created. It may mean that we have
> to respond to an 'online' event in addition to an 'offline' event?
>
> > When the member is
> > a Compute instance and Ceilometer exists, the OS SG could define a
> > Ceilometer alarm for each member (by including these alarms in the
> > template generated for the nested stack that is the SG), programmed to hit
> > the member's deletion webhook when death is detected (I imagine there are
> > a few ways to write a Ceilometer condition that detects instance death).
>
> Yes. Compute instance failure can be detected with a Ceilometer plugin.
> In our prototype, we developed a Dispatcher plugin that can handle
> events like 'compute.instance.delete.end', 'compute.instance.create.end'
> after they have been processed based on a event_definitions.yaml file.
> There could be other ways, I think.
Are you aware of the "Existence of instance" meter in ceilometer?
http://docs.openstack.org/developer/ceilometer/measurements.html
I noticed that recently and wondered if it provides an initial metric we
could use to set an alarm so we're notified if an instance in an
autoscaling group is deleted out of band and no longer exists?
> The problem here today is about the recovery of SG member. If it is a
> compute instance, we can 'reboot', 'rebuild', 'evacuate', 'migrate' it,
> just to name a few options. The most brutal way to do this is like what
> HARestarter is doing today -- delete followed by a create.
Well it's also the same as you would do in a scaling group - if a metric
showed absence or lack of health for an instance, you could just delete it
and build a replacement.
This is why I think HARestarter should be deprecated in favour of just
using AutoScalingGroups combined with appropriate alarms.
> > When the member is a nested stack and Ceilometer exists, it could be the
> > member stack's responsibility to include a Ceilometer alarm that detects
> > the member stack's death and hit the member stack's deletion webhook.
>
> This is difficult. A '(nested) stack' is a Heat specific abstraction --
> recall that we have to annotate a nova server resource in its metadata
> to which stack this server belongs. Besides the 'visible' resources
> specified in a template, Heat may create internal data structures and/or
> resources (e.g. users) for a stack. I am not quite sure a stack's death
> can be easily detected from outside Heat. It would be at least
> cumbersome to have Heat notify Ceilometer that a stack is dead, and then
> have Ceilometer send back a signal.
>
> > There is a small matter of how the author of the template used to create
> > the member stack writes some template snippet that creates a Ceilometer
> > alarm that is specific to a member stack that does not exist yet.
>
> How about just one signal responder per ScalingGroup? A SG is supposed
> to be in a better position to make the judgement: do I have to recreate
> a failed member? am I recreating it right now or wait a few seconds?
> maybe I should recreate the member on some specific AZs?
This is what we have already - you have one ScalingPolicy (which is a
SignalResponder), and the ScalingPolicy is the place where you make the
decision about what to do with the data provided from the alarm.
What we're currently missing is a way to pass data in when doing the scale
up/down of the group so the ScalingPolicy could trigger replacement of a
failed instance instead of just building a new one (we'd pass the id of the
failed instance in as a hint, then we'd build a new one and remove the
failed one).
> If there is only one signal responder per SG, then the 'webhook' (or
> resource signal, my preference) need to carry a payload indicating when
> and which member is down.
"webhooks" and resource signals are the same thing, it's just the auth
method which differs (and which API you hit), inside the engine/resource
implementation they are exactly the same.
resource signals can already carry a payload, so it's just a case of
getting ceilometer to provide the appropriate data when sending the alarm
signal, and adjusting the ScalingPolicy to use it appropriately.
> > I suppose we could stipulate that if the member template includes a
> > parameter with name "member_name" and type "string" then the OS OG takes
> > care of supplying the correct value of that parameter; as illustrated in
> > the asg_of_stacks.yaml of https://review.openstack.org/#/c/97366/ , a
> > member template can use a template parameter to tag Ceilometer data for
> > querying. The URL of the member stack's deletion webhook could be passed
> > to the member template via the same sort of convention.
>
> I am not in favor of the per-member webhook design. But I vote for an
> additional *implicit* parameter to a nested stack of any groups. It
> could be an index or a name.
I agree, we just need appropriate metadata in ceilometer, which can then be
passed back to heat via the resource signal when the alarm happens.
> > When Ceilometer
> > does not exist, it is less obvious to me what could usefully be done. Are
> > there any useful SG member types besides Compute instances and nested
> > stacks? Note that a nested stack could also pass its member deletion
> > webhook to a load balancer (that is willing to accept such a thing, of
> > course), so we get a lot of unity of mechanism between the case of
> > detection by infrastructure vs. application level detection.
> >
>
> I'm a little bit concerned about passing the member deletion webhook to
> LB. Maybe we need to rethink about this: do we really want to bring
> application level design considerations down to the infrastructure level?
>
> Some of the detection work might be covered by the observer engine specs
> that is under review. My doubt about it is about how to make it "listen
> only to what need to know while ignore everything else".
>
> > I am not entirely happy with the idea of a webhook per member. If I
> > understand correctly, generating webhooks is a somewhat expensive and
> > problematic process. What would be the alternative?
>
> My understanding is that the webhooks' problem is not about cost, it is
> more about authentication and flexibility. Steve Hardy and Thomas Herve
> are already looking into the authentication problem.
Well every SignalResponder resource creates a user in keystone, so not
"expensive" as such, but it makes sense IMO to stick to the current model,
where filtering of things happens in ceilometer, then we get an alarm
containing data sent to the scaling policy resource. Having every group
member be a signal responder definitely does not make sense to me.
The first step is identifying what data ceilometer needs to send us, and
the second step is getting the (native) scaling policy resource to use it.
The current transport and signalling topology should be sufficient AFAICS.
Steve
More information about the OpenStack-dev
mailing list