[openstack-dev] [Vitrage] About alarms reported by datasource and the alarms generated by vitrage evaluator

Yujun Zhang zhangyujun+zte at gmail.com
Sat Jan 7 07:27:51 UTC 2017


The two questions raised by YinLiYin is actually one, i.e. *how to enrich
the alarm properties *that can be used as an condition in root cause
deducing.

Both 'suspect' or 'datasource' are additional information that may be
referred as a condition in general fault model, a.k.a. scenario in vitrage.

It seems it could be done by

   1. introduce a flexible `metadata` dict in to ALARM entity
   2. Allow generating update event[1] on metadata change
   3. Allow using ALARM metadata in scenario condition
   4. Allow setting ALARM metadata in scenario action

This will leave the flexibility to continuous development by defining a
complex scenario template and keep the vitrage evaluator simple and generic.

My two cents.

[1]:
http://docs.openstack.org/developer/vitrage/scenario-evaluator.html#concepts-and-guidelines


On Sat, Jan 7, 2017 at 2:23 AM Afek, Ifat (Nokia - IL) <ifat.afek at nokia.com>
wrote:

> Hi YinLiYin,
>
>
>
> This is an interesting question. Let me divide my answer to two parts.
>
>
>
> First, the case that you described with Nagios and Vitrage. This problem
> depends on the specific Nagios tests that you configure in your system, as
> well as on the Vitrage templates that you use. For example, you can use
> Nagios/Zabbix to monitor the physical layer, and Vitrage to raise deduced
> alarms on the virtual and application layers. This way you will never have
> duplicated alarms. If you want to use Nagios to monitor the other layers as
> well, you can simply modify Vitrage templates so they don’t raise the
> deduced alarms that Nagios may generate, and use the templates to show RCA
> between different Nagios alarms.
>
>
>
> Now let’s talk about the more general case. Vitrage can receive alarms
> from different monitors, including Nagios, Zabbix, collectd and Aodh. If
> you are using more than one monitor, it is possible that the same alarm
> (maybe with a different name) will be raised twice. We need to create a
> mechanism to identify such cases and create a single alarm with the
> properties of both monitors. This has not been designed in details yet, so
> if you have any suggestion we will be happy to hear them.
>
>
>
> Best Regards,
>
> Ifat.
>
>
>
>
>
> *From: *"yinliyin at zte.com.cn" <yinliyin at zte.com.cn>
> *Reply-To: *"OpenStack Development Mailing List (not for usage
> questions)" <openstack-dev at lists.openstack.org>
> *Date: *Friday, 6 January 2017 at 03:27
> *To: *"openstack-dev at lists.openstack.org" <
> openstack-dev at lists.openstack.org>
> *Cc: *"gong.yahui5 at zte.com.cn" <gong.yahui5 at zte.com.cn>, "
> han.jing28 at zte.com.cn" <han.jing28 at zte.com.cn>, "wang.weiya at zte.com.cn" <
> wang.weiya at zte.com.cn>, "jia.peiyuan at zte.com.cn" <jia.peiyuan at zte.com.cn>,
> "zhang.yujunz at zte.com.cn" <zhang.yujunz at zte.com.cn>
> *Subject: *[openstack-dev] [Vitrage] About alarms reported by datasource
> and the alarms generated by vitrage evaluator
>
>
>
> Hi all,
>
>    Vitrage generate alarms acording to the templates. All the alarms
> raised by vitrage has the type "vitrage". Suppose Nagios has an alarm A.
> Alarm A is raised by vitrage evaluator according to the action part of a
> scenario, type of alarm A is "vitrage". If Nagios reported alarm A latter,
> a new alarm A with type "Nagios" would be generator in the entity graph.
>   There would be two vertices for the same alarm in the graph. And we have
> to define two alarm entities, two relationships, two scenarios in the
> template file to make the alarm propagation procedure work.
>
>    It is inconvenient to describe fault model of system with lot of
> alarms. How to solve this problem?
>
>
>
> 殷力殷 YinLiYin
>
>
>
>
>
>
> 上海市浦东新区碧波路889号中兴研发大楼D502
> D502, ZTE Corporation R&D Center, 889# Bibo Road,
> Zhangjiang Hi-tech Park, Shanghai, P.R.China, 201203
> T: +86 21 68896229 <+86%2021%206889%206229>
> M: +86 13641895907 <+86%20136%204189%205907>
> E: yinliyin at zte.com.cn
> www.zte.com.cn
>
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20170107/9d39fe4c/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.gif
Type: image/gif
Size: 6016 bytes
Desc: not available
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20170107/9d39fe4c/attachment.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image002.gif
Type: image/gif
Size: 2065 bytes
Desc: not available
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20170107/9d39fe4c/attachment-0001.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.gif
Type: image/gif
Size: 6016 bytes
Desc: not available
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20170107/9d39fe4c/attachment-0002.gif>


More information about the OpenStack-dev mailing list