[openstack-dev] 答复: Re: [Vitrage] About alarms reported by datasource and the alarms generated by vitrage evaluator

Afek, Ifat (Nokia - IL) ifat.afek at nokia.com
Sun Jan 15 13:17:43 UTC 2017


Hi Yinliyin,

There are two use cases:
One is yours, where you have a single monitor that generates “real” alarms, and Vitrage that generates deduced alarms.
Another is where someone has a few monitors, and there might be a collision/equivalence between their alarms.

The solution that you suggested might solve the first use case, but I wouldn’t want to ignore the second one, which is also valid.

Regarding some of your specific suggestions:

1.       In templates, we only define the alarm entity for the datasource that the alarm is reported by, such as Nagios.
[Ifat] This will only work for a single monitor.
       2.  When evaluator deduce an alarm, it would raise the alarm with the type set to be the datasource that would report the alarm, not be vitrage.
[Ifat] I don’t think this is right. In Vitrage Alarm view in the UI, displaying the deduced alarm as “Nagios” is misleading, since Nagios did not report this alarm.

I can think of a solution that is specific to the deduced alarms case, where we will replace a Vitrage alarm with a “real” alarm whenever there is a collision. This solution is easier, but we should carefully examine all use cases to make sure there is no ambiguity. However, for the more general use case I would prefer the option that we discussed in a previous mail, of having two (or more) alarms connected with a ‘equivalent’ relationship.

What do you think?
Ifat.


From: "yinliyin at zte.com.cn" <yinliyin at zte.com.cn>
Date: Saturday, 14 January 2017 at 09:57


·         It won’t solve the general problem of two different monitors that raise the same alarm

·           [yinliyin] Generally, we would only deploy one monitor for a same alarm.

·         It won’t solve possible conflicts of timestamp and severity between different monitors

·          [yinliyin] Please see the following contents.

·         It will make the decision of when to delete the alarm more complex (delete it when the deduced alarm is deleted? When Nagios alarm is deleted? both? And how to change the timestamp and severity in these cases?)

·          [yinliyin] Please see the following contents.


   The following is the basic idea of solving the problem in this situation:

       1.  In templates, we only define the alarm entity for the datasource that the alarm is reported by, such as Nagios.

       2.  When evaluator deduce an alarm, it would raise the alarm with the type set to be the datasource that would report the alarm, not be vitrage.

       3.  When entity_graph get the events from the "evaluator_queue"(all the alarms in the "evaluator_queue" are deduced alarms), it queries the graph to find out whether there was a same alarm reported  by datasource. If  it was true,  it would discard the alarm.

      4.  When entity_graph get the events from "queue",  it queries the graph to find out whether there was a same alarm deduced by evaluator. If it was true, it would replace the alarm in the graph with the newly arrived alarm reported by the datasource.

     5.  When the evaluator deduced that an alarm would be deleted, it deletes the alarm whatever the generation type of the alarm be(Generated by datasource or deduced by evaluator).

     6. When datasource reports recover event of an alarm, entity_graph would query graph to find out whether the alarm was exist. If the alarm was not exist, entity_graph would discard the event.
































-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20170115/77fdbe1a/attachment.html>


More information about the OpenStack-dev mailing list