[openstack-dev] [ceilometer][aodh][vitrage] Raising custom alarms in AODH
AFEK, Ifat (Ifat)
ifat.afek at alcatel-lucent.com
Thu Dec 3 14:07:17 UTC 2015
> From: Julien Danjou [mailto:julien at danjou.info]
> Sent: Thursday, December 03, 2015 10:53 AM
> On Thu, Dec 03 2015, AFEK, Ifat (Ifat) wrote:
> > Another question is our need to get alarms from other sources, like
> > Nagios, zabbix, ganglia, etc. We thought that Vitrage would query
> > these Alarms from each source directly, and then create alarms in
> > in the same way as our deduced alarms: for example create
> > nagios_ovs_vswitchd alarm if nagios check_ovs_vswitchd test failed.
> > An alternative could be to integrate nagios directly with AODH.
> > What do you think?
> I think I'd like to be able to answer this question, but I kind of lack
> the bigger picture of what you need these alarms for, and what you
> would like them to do with?
> I think we don't have everything right now in Ceilometer/Gnocchi/Aodh
> to replace something like Nagios _but_ we have a base framework that
> should be more powerful and way more scalable. That could be leveraged
> to built something better that Nagios, while staying compatible.
> What Nagios does is polling, storing state, and doing action based on
> that state. Which is more or less what Ceilometer does (polling),
> Gnocchi does (storing things) and Aodh does (triggering action based on
> the state). Obviously there's more to that (e.g. dependencies) that are
> not handled currently, and that could be added later – maybe in some
> parts of the current telemetry projects, or maybe in Vitrage.
> So how fitting such tools (Nagios, Zabbix, whatever) in those projects
> is an interesting problem. But I'm not clear on the first steps and
> how/why you want to leverage alarms first. :)
One of Vitrage's goals is to gather information from different layers -
Physical, virtual and applicative - create a topology tree with the
Relations between the different entities in all layers, and perform
alarm analysis based on this topology.
Currently, we can get alarms on the virtual layer from Ceilometer, and
alarms on the physical layer from Nagios for example. We can then try
to correlate all these alarms, compute RCA, and optionally trigger other
alarms, for example that an application might be running in suboptimal
state due to cpu threshold alarm on the instance.
We didn't suggest that Ceilometer will replace Nagios, rather that
Ceilometer might get Nagios test results as input/events, and trigger
Corresponding alarms. Since right now Nagios and Ceilometer are not
connected, we thought that at the first stage we will query alarms
separately from Ceilometer and from Nagios.
Is it more clear?
More information about the OpenStack-dev