[openstack-dev] [Vitrage] About alarms reported by datasource and the alarms generated by vitrage evaluator

Afek, Ifat (Nokia - IL) ifat.afek at nokia.com
Thu Jan 12 09:06:51 UTC 2017


Hi Yujun,

See my comments inline.

Ifat.

From: Yujun Zhang <zhangyujun+zte at gmail.com>
Date: Wednesday, 11 January 2017 at 12:12


I have just realized abstract alarm is not a good term. What I was talking about is fault and alarm.

Fault is what actually happens, and alarm is how it is detected (or deduced).


On Wed, Jan 11, 2017 at 5:13 PM Yujun Zhang <zhangyujun+zte at gmail.com<mailto:zhangyujun%2Bzte at gmail.com>> wrote:

I think YinLiYin's idea is a reasonable requirement from end user. They care more about the real faults in the system, not how they are detected. Though it will bring much challenge to design and engineering, it creates value for customers. I'm quite positive on this evolution.

[Ifat] Of course. I never argued about the need, just tried to figure out how we should implement it.

One possible solution would be introducing a high level (abstract) template from users view. Then convert it to Vitrage scenario templates (or directly to graph). The more sources (nagios, vitrage deduction) for an abstract alarm we get from the system, the more confidence we get for a real fault. And the confidence of an alarm could be included in the scenario condition.

[Ifat] I understand your idea, not sure yet if it helps with the use case.
How would you imagine the ‘confidence’ property? As Boolean or a counter? One option is ‘deduced’ vs. ‘monitored’. Another option is to count the number of monitors that reported it. Personally, I don’t think this is needed. I think that if Nagios reports an error, then it is confident enough without getting it from another monitor.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20170112/eafd138e/attachment.html>


More information about the OpenStack-dev mailing list