[openstack-dev] [ceilometer][aodh][vitrage] Raising custom alarms in AODH
AFEK, Ifat (Ifat)
ifat.afek at alcatel-lucent.com
Thu Dec 3 07:32:53 UTC 2015
Hi Ryota,
Thanks for your response, please see my comments below.
Ifat.
> -----Original Message-----
> From: Ryota Mibu [mailto:r-mibu at cq.jp.nec.com]
>
> Hi,
>
>
> Sorry for my late response...
>
> It seems like a fundamental question whether we should have rich
> function or intelligence in on-the-fly event alarm evaluation. I think
> we can add simple operations (like aggregating alarm) in aodh
> evaluator, and other operations (like deducing with referring some
> external DB) should be done outside of the evaluation process to reduce
> impact on other evaluations. But, if we separate too much, then there
> will be many interactions between two services that makes slow to
> finish sequence of alarm handling.
>
> One approach we can take, is that you configure aodh to pass each row
> event (e.g. each VM downed) wrapped in alarm notification to vitrage,
> then do some operation (e.g. deducing, aggregating) and store resource-
> level alarm without any alarm_actions, so that users can see the alarms
> in horizon view. This may not require alarm evaluation, so we can
> forget the problem I raised (cache refresh interval).
Let me see if I got this right: are you suggesting that we create
on-the-fly alarm definitions with no alarm_actions, for every deduced
alarm that we want to raise? And this will spare us the extra alarm
evaluation in AODH?
It does make sense.
My next question is how exactly we should create these resource-level
alarms. Can we create an alarm definition with no rule, no actions,
and initial state set to "alarm"? (I'm not sure it can be done in the
current AODH API)
Another question is our need to get alarms from other sources, like
Nagios, zabbix, ganglia, etc. We thought that Vitrage would query these
Alarms from each source directly, and then create alarms in AODH in the
same way as our deduced alarms: for example create nagios_ovs_vswitchd
alarm if nagios check_ovs_vswitchd test failed.
An alternative could be to integrate nagios directly with AODH.
What do you think?
> BTW, is it useful to have on-the-fly evaluation of combination alarm
> with event alarms for alarm aggregation or other cases?
I'm not sure I understand. Can you give a detailed example?
> Horizon view is the different topic. Maybe we can reduce the number of
> alarms listed in user view by creating raw alarms in admin space that
> is not visible from end user, or using relevant severity or tag so that
> user can filter out uninterested alarms.
Referring to this[1] blueprint, do you have specific concerns regarding
the usability/performance of Horizon view when there are many alarms?
I think that your ideas make sense, and we can implement them if there
is a need.
In addition, in Vitrage we plan to handle alarm aggregation by creating
aggregation rule templates, for example based on the RCA information.
The user will be able to see only the root cause alarms, and then drill
down to all specific alarms. But I doubt if this will be done for Mitaka.
[1] https://blueprints.launchpad.net/horizon/+spec/ceilometer-alarm-management-page
Thanks,
Ifat.
More information about the OpenStack-dev
mailing list