[openstack-dev] [ceilometer][aodh][vitrage] Raising custom alarms in AODH

Ryota Mibu r-mibu at cq.jp.nec.com
Tue Dec 8 09:16:55 UTC 2015


Hi Ifat,



> > Can we clarify use case again in terms of service role definition?
> 
> Our use cases focus on giving value to the cloud admin, who will be able to:
> 
> - view the topology of his environment, the relations between the physical, virtual and applicative layer and the
> statuses all resources
> - view the alarms history
> - view alarms about problems that Vitrage deduced could happen, even if no other OpenStack component reported these
> problems (yet)
> - view RCA information about the alarms

OK, thanks.

> > Aodh provides alarming mechanism to *notify* events and situations
> > calculated from various data sources. But, original/master information
> > of resource including latest resource state is owned by other services
> > such as nova.
> >
> > So, user who wants to know current resource state to find out dead
> > resources (instances), can simply query instances via nova api. And,
> > user who wants to know when/what failure occurred can query events via
> > ceilometer api. Aodh has alarm state and history though.
> 
> I'm not sure I fully understand the difference between Aodh events and alarms. If the user wants to know what failure
> occurred, is it part of Aodh events, alarms, or both?

In short, 'event' is generated in OpenStack, 'alarm' is defined by a user. 'event' is a container of data passed from other OpenStack services through OpenStack notification bus. 'event' and contained data will be stored in ceilometer DB and exposed via event api [1]. 'alarm' is pre-configured alerting rule defined by a user via alarm API [2]. 'Alarm' also has state like 'ok' and 'alarm', and history as well.

[1] http://docs.openstack.org/developer/ceilometer/webapi/v2.html#events-and-traits
[2] http://docs.openstack.org/developer/aodh/webapi/v2.html#alarms


The point is whether we should use 'event' or 'alarm' for all failure representation. Maybe we can use 'event' for all raw error/fault notification, and use 'alarm' for exposing deduced/wrapped failure. This is my view, so might be wrong.


Best regards,
Ryota



More information about the OpenStack-dev mailing list