[openstack-dev] [ceilometer][aodh][vitrage] Raising custom alarms in AODH

AFEK, Ifat (Ifat) ifat.afek at alcatel-lucent.com
Sun Dec 20 18:53:39 UTC 2015


> -----Original Message-----
> From: Ryota Mibu [mailto:r-mibu at cq.jp.nec.com]
> Sent: Tuesday, December 08, 2015 11:17 AM
>
> Hi Ifat,
> 
> In short, 'event' is generated in OpenStack, 'alarm' is defined by a
> user. 'event' is a container of data passed from other OpenStack
> services through OpenStack notification bus. 'event' and contained data
> will be stored in ceilometer DB and exposed via event api [1]. 'alarm'
> is pre-configured alerting rule defined by a user via alarm API [2].
> 'Alarm' also has state like 'ok' and 'alarm', and history as well.
> 
> [1]
> http://docs.openstack.org/developer/ceilometer/webapi/v2.html#events-
> and-traits
> [2] http://docs.openstack.org/developer/aodh/webapi/v2.html#alarms
> 
> 
> The point is whether we should use 'event' or 'alarm' for all failure
> representation. Maybe we can use 'event' for all raw error/fault
> notification, and use 'alarm' for exposing deduced/wrapped failure.
> This is my view, so might be wrong.
> 

Hi,

Let me summarize the issue. 

What we need in Vitrage is:

- custom alarms, where we can set metadata like: {"resource_type":"switch", "resource_name":"switch-2"} or {"resource_type":"nova.instance", "resource_id":<uuid>} or {"nagios_test_name":"check_ovs_vswitchd", "nagios_test_status":"warning"}

- the ability to define an alarm once, and instantiate it multiple times for every instance

- the ability to define an alarm on-the-fly (since we can't predict all alarm types)

- an option to trigger the alarm from vitrage


The optimal solution for us would be to have alarm templates and alarm metadata. Or, we can have a workaround... The current workarounds that I see are:

1. Create an event-alarm on the fly for every alarm on every instance and set its state immediately using Aodh API. The alarm will be stored in the database, but this will not trigger a notification or a call to alarm-actions. The alarm name will have to include the resource name/id, like "Instance <uuid> is at risk due to public switch problem" to make it unique. This might work for Vitrage horizon use cases in Mitaka, but not for future use cases that will require alarm-actions.

2. Send notifications in order to trigger event alarms "by the book". Vitrage notification "Alarm: Instance is at risk due to public switch problem" with metadata {"switch_name":"switch-2", "instance_id":<uuid>} will be converted to a corresponding event, then to an alarm. We will still need to create a different alarm for every instance. And we will have to wait until the cache is refreshed. 


I will be happy to hear your thoughts about it.

Thanks,
Ifat.

















More information about the OpenStack-dev mailing list