[openstack-dev] [heat] health maintenance in autoscaling groups

Qiming Teng tengqim at linux.vnet.ibm.com
Thu Jul 3 05:47:47 UTC 2014


On Wed, Jul 02, 2014 at 12:29:31PM -0400, Mike Spreitzer wrote:
> Qiming Teng <tengqim at linux.vnet.ibm.com> wrote on 07/02/2014 03:02:14 AM:
> 
> > Just some random thoughts below ...
> > 
> > On Tue, Jul 01, 2014 at 03:47:03PM -0400, Mike Spreitzer wrote:
> > > ...
> > > I have not found design discussion of this; have I missed something?
> > > 
> > > I suppose the natural answer for OpenStack would be centered around 
> > > webhooks... 
> > 
> > Well, I would suggest we generalize this into a event messaging or
> > signaling solution, instead of just 'webhooks'.  The reason is that
> > webhooks as it is implemented today is not carrying a payload of useful
> > information -- I'm referring to the alarms in Ceilometer.
> 
> OK, this is great (and Steve Hardy provided more details in his reply), I 
> did not know about the existing abilities to have a payload.  However 
> Ceilometer alarms are still deficient in that way, right?  A Ceilometer 
> alarm's action list is simply a list of URLs, right?  I would be happy to 
> say let's generalize Ceilometer alarms to allow a payload in an action.

Yes. Steve kindly pointed out that an alarm could be used to carry a
payload, though not yet implemented.  My concern is actually about
'flexibility'.  For different purposes, an alarm may be required to
carry payload of different formats.  We need a specification/protocol
between Heat and Ceilometer so that Heat can specify in an alarm:
  - tell me when/which instance is down/up when sending me an alarm
    about instance lifecycle.
  - tell me which instances from my group are affected when a host is
    down
  - (other use cases?)

> > There are other cases as well.  A member failure could be caused by a 
> > temporary communication problem, which means it may show up quickly when
> > a replacement member is already being created.  It may mean that we have
> > to respond to an 'online' event in addition to an 'offline' event?
> > ...
> > The problem here today is about the recovery of SG member.  If it is a
> > compute instance, we can 'reboot', 'rebuild', 'evacuate', 'migrate' it,
> > just to name a few options.  The most brutal way to do this is like what
> > HARestarter is doing today -- delete followed by a create.
> 
> We could get into arbitrary subtlety, and maybe eventually will do better, 
> but I think we can start with a simple solution that is widely applicable. 
>  The simple solution is that once the decision has been made to do 
> convergence on a member (note that this is distinct from merely detecting 
> and noting a divergence) then it will be done regardless of whether the 
> doomed member later appears to have recovered, and the convergence action 
> for a scaling group member is to delete the old member and create a 
> replacement (not in that order).

Umh ... For transient errors, it won't be uncommon that some members may
appear unreachable (e.g. from a load balancer), as a result of say image
downloading saturating network bandwidth.  Sovling this using
convergence logic?  The observer sees only 2 members running instead of
3 which is the desired state, then convergene engine starts to create a
new member.  Now, the previously disappeared member showed up again.
What should the observer do?  Would it be smart enough to know that this
is the old member coming back to life thus cancel the creation of the new
member?  Would it be able to recognize that this instance was part of
a Resource Group at all? 

> > > When the member is a nested stack and Ceilometer exists, it could be 
> the 
> > > member stack's responsibility to include a Ceilometer alarm that 
> detects 
> > > the member stack's death and hit the member stack's deletion webhook. 
> > 
> > This is difficult.  A '(nested) stack' is a Heat specific abstraction --
> > recall that we have to annotate a nova server resource in its metadata
> > to which stack this server belongs.  Besides the 'visible' resources
> > specified in a template, Heat may create internal data structures and/or
> > resources (e.g. users) for a stack.  I am not quite sure a stack's death
> > can be easily detected from outside Heat.  It would be at least
> > cumbersome to have Heat notify Ceilometer that a stack is dead, and then
> > have Ceilometer send back a signal.
> 
> A (nested) stack is not only a heat-specific abstraction but its semantics 
> and failure modes are specific to the stack (at least, its template).  I 
> think we have no practical choice but to let the template author declare 
> how failure is detected.  It could be as simple as creating a Ceilometer 
> alarms that detect death one or more resources in the nested stack; it 
> could be more complicated Ceilometer stuff; it could be based on something 
> other than, or in addition to, Ceilometer.  If today there are not enough 
> sensors to detect failures of all kinds of resources, I consider that a 
> gap in telemetry (and think it is small enough that we can proceed 
> usefully today, and should plan on filling that gap over time).

My opinion is that we cannot blame Ceilometer for the lack of sensors,
because all sensors are from other individual services.  Heat, in this
case, is responsible for detecting and reporting nested stack failures.
Detecting failures based on partial evaluation of resources in that
stack could be a good starting point.

> > > There is a small matter of how the author of the template used to 
> create 
> > > the member stack writes some template snippet that creates a 
> Ceilometer 
> > > alarm that is specific to a member stack that does not exist yet. 
> > 
> > How about just one signal responder per ScalingGroup?  A SG is supposed
> > to be in a better position to make the judgement: do I have to recreate
> > a failed member? am I recreating it right now or wait a few seconds?
> > maybe I should recreate the member on some specific AZs?
> 
> That is confusing two issues.  The thing that is new here is making the 
> scaling group recognize member failure; the primary reaction is to update 
> its accounting of members (which, in the current code, must be done by 
> making sure the failed member is deleted); recovery of other scaling group 
> aspects is fairly old-hat, it is analogous to the problems that the 
> scaling group already solves when asked to increase its size.

Okay. Agree with you that we need to 'fence' a member if it is suspected
to be dead.  The point I wanted to make is that there lies a possibility
to tune scaling out behaviors, which sounds no worse than a 'policy' thing
as ScalingPolicy does today.

> > ...
> > > I suppose we could stipulate that if the member template includes a 
> > > parameter with name "member_name" and type "string" then the OS OG 
> takes 
> > > care of supplying the correct value of that parameter; as illustrated 
> in 
> > > the asg_of_stacks.yaml of https://review.openstack.org/#/c/97366/ , a 
> > > member template can use a template parameter to tag Ceilometer data 
> for 
> > > querying.  The URL of the member stack's deletion webhook could be 
> passed 
> > > to the member template via the same sort of convention. 
> > 
> > I am not in favor of the per-member webhook design.  But I vote for an
> > additional *implicit* parameter to a nested stack of any groups.  It
> > could be an index or a name.
> 
> Right, I was elaborating on a particular formulation of "implicit 
> parameter".  In particular, I suggested an "implicit parameter value" for 
> an optional explicit parameter.  We could make the parameter declaration 
> implicit, but that (1) is a bit irregular (reminiscent of "modes") if we 
> only do it for stacks that are scaling group members and (2) is equivalent 
> to the existing concept of psuedo-parameters if we do it for all stacks. I 
> would be content with adding a pseudo-parameter for all stacks that is the 
> UUID of the stack.  The index of the member in the group could be 
> problematic, as those are re-used; the UUID is not re-used.  Names also 
> have issues with uniqueness.

I would vote for having 'stack_id' an implicit property for maybe all 
resources (incl. parameter for nested stacks).  It would help all
'monitor' or 'observer' identify where this resource is from?  This is
especially useful for logics outside of Heat. 

As for index, it has been used somewhere to refer to a group member.
Uniqueness is important when providing a way to refer to a member.

> > > When Ceilometer 
> > > does not exist, it is less obvious to me what could usefully be done. 
> Are 
> > > there any useful SG member types besides Compute instances and nested 
> > > stacks?  Note that a nested stack could also pass its member deletion 
> > > webhook to a load balancer (that is willing to accept such a thing, of 
> 
> > > course), so we get a lot of unity of mechanism between the case of 
> > > detection by infrastructure vs. application level detection.
> > > 
> > 
> > I'm a little bit concerned about passing the member deletion webhook to
> > LB.  Maybe we need to rethink about this: do we really want to bring
> > application level design considerations down to the infrastructure 
> level?
> 
> I look at it this way: do we want two completely independent loops of 
> detection and response, or shall we share a common response mechanism with 
> two different levels of detection?  I think both want the same response, 
> and so recommend a shared response mechanism.

IMO, application level failure detection/recover is itself a challenge.
One possible solution could be installing HA stack (e.g. Linux-HA) inside 
guests to monitor applications.  Even with that, we still cannot tell
whether an application is dead or not.  Only the application itself can
give a final answer.  That is where the OpenAIS framework plays an
important role above Pacemaker/Corosync.  An application can actively
participate in the failure detection/recovery process.  In some cases,
they have to.  So application failure detection looks pretty specific
to each application.  

> > Some of the detection work might be covered by the observer engine specs
> > that is under review.  My doubt about it is about how to make it "listen
> > only to what need to know while ignore everything else".
> 
> I am not sure what you mean by that.  If this is about the case of the 
> group members being nested stacks, I go back to the idea that it must be 
> up to the nested template author to define failure (via declaring how to 
> detect it).

I was trying to to express my worry about the convergence work.  If it
is not looking into all detailed changes, there might be events missed.
If it does, the overhead could be a problem.  Anyway, that is a
different topic.

> > > I am not entirely happy with the idea of a webhook per member.  If I 
> > > understand correctly, generating webhooks is a somewhat expensive and 
> > > problematic process.  What would be the alternative?
> > 
> > My understanding is that the webhooks' problem is not about cost, it is
> > more about authentication and flexibility.  Steve Hardy and Thomas Herve
> > are already looking into the authentication problem.
> 
> I was not disagreeing, I was including those in "problematic".
> 
> Thanks,
> Mike
> 

> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




More information about the OpenStack-dev mailing list