[openstack-dev] [ceilometer][aodh][vitrage] Raising custom alarms in AODH

Ryota Mibu r-mibu at cq.jp.nec.com
Thu Dec 3 09:23:05 UTC 2015


Hi Ifat,


> > One approach we can take, is that you configure aodh to pass each row
> > event (e.g. each VM downed) wrapped in alarm notification to vitrage,
> > then do some operation (e.g. deducing, aggregating) and store
> > resource- level alarm without any alarm_actions, so that users can see
> > the alarms in horizon view. This may not require alarm evaluation, so
> > we can forget the problem I raised (cache refresh interval).
> 
> Let me see if I got this right: are you suggesting that we create on-the-fly alarm definitions with no alarm_actions,
> for every deduced alarm that we want to raise? And this will spare us the extra alarm evaluation in AODH?

Yes. But, please note that could be the first step. The next step would be make vitrage to send out alarm event to ceilometer/aodh the pre-configured event alarm will recognize the alarm and fire the alarm notification to another service or an end user. Eventually, we should have relevant alarm type and evaluator to proxy evaluation in vitrage, I think.


> My next question is how exactly we should create these resource-level alarms. Can we create an alarm definition with
> no rule, no actions, and initial state set to "alarm"? (I'm not sure it can be done in the current AODH API)

You can. This is not proper way of using aodh though. But, this is easy to create an alarm entry to show it in horizon.


> Another question is our need to get alarms from other sources, like Nagios, zabbix, ganglia, etc. We thought that
> Vitrage would query these Alarms from each source directly, and then create alarms in AODH in the same way as our
> deduced alarms: for example create nagios_ovs_vswitchd alarm if nagios check_ovs_vswitchd test failed.
> An alternative could be to integrate nagios directly with AODH.
> What do you think?

Hmm, I don't have clear view on this. If the source can includes OpenStack IDs and can be generate relevant meter/sample, it should be useful to integrate with ceilometer. But if you want to do some operations (like correlation), then it is reasonable to integrate with vitrage.


> > BTW, is it useful to have on-the-fly evaluation of combination alarm
> > with event alarms for alarm aggregation or other cases?
> 
> I'm not sure I understand. Can you give a detailed example?

OK. The 'combination' type alarm enables you to aggregate multiple alarm to one alarm. This can be used when you want to receive alarm when the both of physical NIC ports are downed to recognize logical connection unavailability if the ports are teamed for redundancy. Now, the combination alarms are evaluated periodically that means you can receive combination alarm not on-the-fly while you are using event alarms as source of combination alarm though.

> > Horizon view is the different topic. Maybe we can reduce the number of
> > alarms listed in user view by creating raw alarms in admin space that
> > is not visible from end user, or using relevant severity or tag so
> > that user can filter out uninterested alarms.
> 
> Referring to this[1] blueprint, do you have specific concerns regarding the usability/performance of Horizon view
> when there are many alarms?
> I think that your ideas make sense, and we can implement them if there is a need.

Sorry, I'm not familiar with horizon these days... But, if you need change in aodh side, I can help you.


> In addition, in Vitrage we plan to handle alarm aggregation by creating aggregation rule templates, for example based
> on the RCA information.
> The user will be able to see only the root cause alarms, and then drill down to all specific alarms. But I doubt if
> this will be done for Mitaka.

I think 'the RCA information' means information for RCA. I mean vitrage will use the resource topologies or relationship in aggregation, rather than result of RCA. Am I right?


Best regards,
Ryota



More information about the OpenStack-dev mailing list