[openstack-dev] [ceilometer][aodh][vitrage] Raising custom alarms in AODH

AFEK, Ifat (Ifat) ifat.afek at alcatel-lucent.com
Mon Dec 7 14:55:17 UTC 2015

> -----Original Message-----
> From: Julien Danjou [mailto:julien at danjou.info]
> Sent: Monday, December 07, 2015 12:00 PM
> I find it odd to have UI use cases first, as their terribly large for a
> MVP. Unless Vitrage already exists and you have all the code figured
> out. :)

We have most of it figured out.
We have an RCA engine written in java as a proprietary CloudBand code, 
with UI for showing the topology and RCA, and it is already working 
in production environments. 
We have decided to write a similar project in python as part of 
OpenStack project. Obviously, writing in OpenStack brings up new 
challenges which we are now trying to solve.

> > In case you haven't seen in yet, our high level architecture is on
> > Vitrage main page[2], and in the coming days we plan to document also
> > the lower level design.
> I just looked at it, at it's very interesting. All the high level
> functionalities make sense and provide values. But if you try to solve
> them all 5 at once, I'm afraid you're going to either build a monster
> (with a lot of overlap with other projects, hard to maintain, etc) or
> just crash because you'll be blocked by all other OpenStack projects.
> That's the big issue when starting to build a project on top of others
> OpenStack bricks.
> Overall I'm just saying that because it's still not clear to me which
> part you're trying to solve in this thread and how we can help you.
> What can we provide in our projects, that you miss, that could help
> you, concretely? What feature we need to work on next?
> It would be awesome to have _one_ use-case described end-to-end that
> you would like to solve with Vitrage, leveraging various OpenStack
> projects, that you cannot solve right now because of missing pieces.
> Then we could start identifying these missing pieces and implement/fix
> them. :-)

We are not going to implement 5 use cases at once :-)
We will start with the physical-to-virtual mapping + a UI for visualizing 
this topology. This is the basic functionality for our next use cases.
Next, we will move to the RCA and the deduced alarms use cases. Alarm 
aggregation probably won't be implemented for mitaka.

Let me describe in details the deduced alarms use case.

1. Vitrage gets an alarm from Nagios about a public switch failure

2. Vitrage evaluator decides (based on its templates) that an "Instance is 
at risk due to public switch problem" alarm should be triggered for every 
instance on every host attached to this public switch

3. Vitrage notifier creates corresponding alarm definitions in Aodh 

4. Aodh stores these alarms in its database 

5. Vitrage triggers the alarms (sets their states)

6. Aodh updates the alarms states and notifies about it 

7. Horizon user queries Aodh for a list of all alarms. Aodh returns a list 
that includes the alarms that were triggered by Vitrage.

The added value of this use case, is that the Cloud Admin can see that
some instances are at risk, even thought their Nova statuses are ok.

For the integration with Aodh, we need the ability to create alarm
definitions that are not based on metrics, and to trigger them ourselves.

What do you think?

Thanks for your feedback, it is very helpful! 

Ifat and Alexey.

More information about the OpenStack-dev mailing list