[openstack-dev] [ceilometer][aodh][vitrage] Raising custom alarms in AODH

AFEK, Ifat (Ifat) ifat.afek at alcatel-lucent.com
Thu Dec 3 13:57:39 UTC 2015


Hi Ryota,

> From: Ryota Mibu [mailto:r-mibu at cq.jp.nec.com]
> >
> > Let me see if I got this right: are you suggesting that we create
> > on-the-fly alarm definitions with no alarm_actions, for every deduced
> alarm that we want to raise? And this will spare us the extra alarm
> evaluation in AODH?
> 
> Yes. But, please note that could be the first step. The next step would
> be make vitrage to send out alarm event to ceilometer/aodh the pre-
> configured event alarm will recognize the alarm and fire the alarm
> notification to another service or an end user. Eventually, we should
> have relevant alarm type and evaluator to proxy evaluation in vitrage,
> I think.

The next step can happen if and when Aodh supports alarm templates. 
If Vitrage can handle about 30 alarm types, and there are 100 instances, 
we don't want to pre-configure 3000 alarms, which most likely will never 
be triggered.

> > Another question is our need to get alarms from other sources, like
> > Nagios, zabbix, ganglia, etc. We thought that Vitrage would query
> > these Alarms from each source directly, and then create alarms in
> AODH in the same way as our deduced alarms: for example create
> nagios_ovs_vswitchd alarm if nagios check_ovs_vswitchd test failed.
> > An alternative could be to integrate nagios directly with AODH.
> > What do you think?
> 
> Hmm, I don't have clear view on this. If the source can includes
> OpenStack IDs and can be generate relevant meter/sample, it should be
> useful to integrate with ceilometer. But if you want to do some
> operations (like correlation), then it is reasonable to integrate with
> vitrage.

The source may include alarms on resources that are not defined in 
OpenStack, like switches or ports. And the alarms are not necessarily 
related to meters, they can be test nagios failures for example.

> > > BTW, is it useful to have on-the-fly evaluation of combination
> alarm
> > > with event alarms for alarm aggregation or other cases?
> >
> > I'm not sure I understand. Can you give a detailed example?
> 
> OK. The 'combination' type alarm enables you to aggregate multiple
> alarm to one alarm. This can be used when you want to receive alarm
> when the both of physical NIC ports are downed to recognize logical
> connection unavailability if the ports are teamed for redundancy. Now,
> the combination alarms are evaluated periodically that means you can
> receive combination alarm not on-the-fly while you are using event
> alarms as source of combination alarm though.

I think I understand your point. It means that certain alarms will 
arrive to Vitrage in delay, due to your evaluation policy. I think we 
will have to address this issue at some point, but it won't change our
overall design.

> > In addition, in Vitrage we plan to handle alarm aggregation by
> > creating aggregation rule templates, for example based on the RCA
> information.
> > The user will be able to see only the root cause alarms, and then
> > drill down to all specific alarms. But I doubt if this will be done
> for Mitaka.
> 
> I think 'the RCA information' means information for RCA. I mean vitrage
> will use the resource topologies or relationship in aggregation, rather
> than result of RCA. Am I right?

The term "aggregation" is used in different contexts, which may be 
confusing. Our plan is to examine the already-computed RCA information,
and see, for example, that a switch failure alarm caused alarms on 100
related instances. In horizon, the result will be 101 alarms shown to 
the user in a flat list. 
By "alarm aggregation based on RCA" we mean that we will have 
an API to get root cause alarms, which will return only the switch 
alarm. The horizon user will see one alarm, and may then ask to expand 
the view and see all the other alarms that were caused by it. 

Best Regards,
Ifat.







More information about the OpenStack-dev mailing list