[openstack-dev] [aodh][vitrage] Aodh generic alarms

Julien Danjou julien at danjou.info
Thu Jan 26 16:41:02 UTC 2017


On Thu, Jan 26 2017, Afek, Ifat (Nokia - IL) wrote:

> I’ll try to answer your question from a user perspective. 

Thanks for your explanation, it helped me a lot to understand how you
view things. :)

> Suppose a bridge has a bond of two physical ports, and Zabbix detects a signal
> loss in one of them. This failure has no immediate effect on the host,
> instances or applications, and will not be reflected anywhere in OpenStack.
>
> Vitrage will receive an alarm from Zabbix, identify the instances that will be
> affected if the entire bond fails, and create deduced alarms that they are at
> risk (if the other port fails they will become unreachable). Similarly, it will
> create alarms on the relevant applications.

So when you say "create deduced alarms"… What does it mean? I understand
the deduction, but I am not sure what it "creates" – 'cause then you
say:

> A user that checks Aodh will see that all alarms are in ‘ok’ state, which might
> be misleading.

Which alarms? Could you be more precise? Where these alarms come from?
Are they created by the users or by Vitrage automatically?
If it's a CPU usage of its instance there's no reason for it to become
red.

If I recall correctly what you explained to me a while back, there are
alarms created by Vitrage based on some rules, so I imagine these are
the ones you talk about?

> The user might determine that everything is ok with the instances that
> Aodh is monitoring. If the user then checks Vitrage, he will see the
> deduced alarms and understand that the instances and the applications
> are at risk.

From what I understood the user can't really check Vitrage (IIRC it does
not really have a full API for users yet), right?

> Does it make sense that the user will check Aodh *and* Vitrage? A standard user
> would like to see all of the alarms in one place, no matter which monitor was
> responsible for triggering them.

Yes: it does make sense for the user to check both because of the way
Aodh+Vitrage are architectured right now. Does it make sense in term of
user experience? I think we both agree that no it does not. Having a
central place of alerting would be awesome.

But does it make sense to force-fed Vitrage alarms and data model in
Aodh? I am not sure right now. If I circle back again to UX, when a user
requests Aodh, it only sees alarm he created and he managed. With
generic alarms, the way it's pushed right now, there's going to be a
bunch of generic thing the user has barely any clue about that can do
things he has no idea – because it can't really do anything on Vitrage.

And even if Vitrage had an API to manipulate the rules and all (I can
easily imagine it's in the roadmap) that means it would manipulate
deduction rules on the Vitrage API and then see things magically happen
into his Aodh account. I find that… weird. It sounds a lot prone to
failure and out-of-async between Aodh and Vitrage.

Let's imagine another scenario/solution (which I am *not* advocating,
it's just an exercise for thought): Vitrage would store its alarms
(defined and created bases on its rules) in a database. It would then
offer an access to it to Aodh (e.g. via an HTTP API). Then Aodh could
query it.
For example, when a user would ask Aodh to list the alarms, Aodh will
return the alarms that are store in its own database (created by the
user) and would also query Vitrage to return the list of alarms created
by Vitrage rules (and their deducted state).

What's the point of such a design? Well it's less prone to
out-of-sync-ness and does not force any data model in Aodh that it has
no use for. It also solves the problem of "having a central listing of
alarms" for the user – the user does not have to be aware of Vitrage. Is
it a good technical design? Probably not. It seems weird to make Aodh a
bridge to Vitrage.

And I think that's the whole thing I am not liking from the current
proposal and the one I just invented. The way Aodh and Vitrage are
bridged, the way Vitrage is built on top and outside of Aodh right now
feels wobbly to me.

So here's another question then: why wouldn't there be a "zabbix" alarm
type in Aodh that could be created by a user (or another program) and
that would be triggered by Aodh when Zabbix does something?
Which is something that is really like the event alarm mechanism which
already exists. Maybe all that's missing is a
Zabbix-to-OpenStack-notification converter to have that feature?

I'll stop that for now to let you reply or my mail is going to be way
too long lol.

> And a side note – you said that Aodh and Zabbix are exactly the same. I agree.
> You can implement in Aodh everything that is implemented in Zabbix. But why do
> that instead of just using that alarms that are already created by another
> monitor?

Oh no point, I was just making a point to be sure we were on the same
line in term of understanding, and it seems we are. :)

> Well… is this awesome enough? ;-)

Yes thanks, I think this is a good example that will help us thinking in
term of UX what we want to build and how we want to build it.

-- 
Julien Danjou
# Free Software hacker
# https://julien.danjou.info
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 800 bytes
Desc: not available
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20170126/84685e9b/attachment-0001.pgp>


More information about the OpenStack-dev mailing list