[openstack-dev] [vitrage] Vitrage alarm processing behavior

Afek, Ifat (Nokia - IL/Kfar Sava) ifat.afek at nokia.com
Wed Feb 7 17:15:39 UTC 2018


Hi Paul,

I’m glad that my fix helped.

Regarding the Doctor datasource: the purpose of this datasource was to be used by the Doctor test scripts. Do you intend to modify it, or to create a new similar datasource that also supports polling? Modifying the existing datasource could be problematic, since we need to make sure the existing functionality and tests stay the same.

In general, most of our datasources support both polling and notifications. A simple example is the Cinder datasource [1]. For example of an alarm datasource, you can look at Zabbix datasource [2]. You can also go over the documentation of how to add a new datasource [3].

As for your question, it is the responsibility of the datasource to clear the alarms that it created. For the Doctor datasource, you can send an event with “status”:”up” in the details and the datasource will clear the alarm.

[1] https://github.com/openstack/vitrage/tree/master/vitrage/datasources/cinder/volume
[2] https://github.com/openstack/vitrage/tree/master/vitrage/datasources/zabbix
[3] https://docs.openstack.org/vitrage/latest/contributor/add-new-datasource.html


Best Regards,
Ifat.


From: Paul Vaduva <Paul.Vaduva at enea.com>
Reply-To: "OpenStack Development Mailing List (not for usage questions)" <openstack-dev at lists.openstack.org>
Date: Wednesday, 7 February 2018 at 15:50
To: "OpenStack Development Mailing List (not for usage questions)" <openstack-dev at lists.openstack.org>
Cc: Ciprian Barbu <Ciprian.Barbu at enea.com>
Subject: Re: [openstack-dev] [vitrage] Vitrage alarm processing behavior

Hi Ifat,

Yes I’ve checked the 1.3.1 refers to a deb package (python-vitrage) version built by us, so the git tag used to build that deb is 1.3.0.
But I also backported doctor datasource from vitreage git master branch.

I also noticed that when I configure snapshots_interval=10 I also get this exception in
/var/log/vitrage/graph.log around the time the alarms disapear.
https://hastebin.com/ukisajojef.sql

I've cherry picked your before mentioned change and the alarm that came from event is now persistent and the exception is gone.
So it was a bug.
I understand that for doctor datasources I need to have events for raising the alarm and also for clearing it is that correct?


Best Regards,
Paul

From: Afek, Ifat (Nokia - IL/Kfar Sava) [mailto:ifat.afek at nokia.com]
Sent: Wednesday, February 7, 2018 1:24 PM
To: OpenStack Development Mailing List (not for usage questions) <openstack-dev at lists.openstack.org>
Subject: Re: [openstack-dev] [vitrage] Vitrage alarm processing behavior

Hi Paul,

It sounds like a bug. Alarms created by a datasource are not supposed to be deleted later on. It might be a bug that was fixed in Queens [1].

I’m not sure which Vitrage version you are actually using. I failed to find a vitrage version 1.3.1. Could it be that you are referring to a version of python-vitrageclient or vitrage-dashboard?

In any case, if you are using an older version, I suggest that you try to use the fix that I mentioned [1] and see if it helps.


[1] https://review.openstack.org/#/c/524228<https://url10.mailanyone.net/v1/?m=1ejNt4-0001fR-4I&i=57e1b682&c=LqJB68i5VuuaUnZ6iOIMHVhcsHMatfhcTwtLpAT-Rn5UZ3qnX4tq4XOTjYR1XqQIDRQGrqGMwZI31cnT-bEHTFX95wRD-iENXse8JBDHIyv8iJUD7RiwDp74HqNHBFZ-BybLQgQ6-sVcf62n2ogMk31b-Sp0xUJZXxH_0q2Iu-4Hodt4gxhKuFMTT2breh42c7OT5kdHzPJThKClzSEBQ2NWkNTCy112gxlapjMCVxSNQ9nsLg4f0XyJaAVUnAHO>


Best Regards,
Ifat.


From: Paul Vaduva <Paul.Vaduva at enea.com<mailto:Paul.Vaduva at enea.com>>
Reply-To: "OpenStack Development Mailing List (not for usage questions)" <openstack-dev at lists.openstack.org<mailto:openstack-dev at lists.openstack.org>>
Date: Wednesday, 7 February 2018 at 11:58
To: "openstack-dev at lists.openstack.org<mailto:openstack-dev at lists.openstack.org>" <openstack-dev at lists.openstack.org<mailto:openstack-dev at lists.openstack.org>>
Subject: [openstack-dev] [vitrage] Vitrage alarm processing behavior

Hi Vitrage developers,

I have a question about vitrage innerworkings, I ported doctor datasource from master branch to an earlier version of vitrage (1.3.1).
I noticed some behavior I am wondering if it's ok or it is bug of some sort.
Here it is:
1. I am sending some event for rasing an alarm to doctor datasource of vitrage.
2. I am receiving the event hence the alarm is displayed on vitrage dashboard attached to the affected resource (as expected)
3. If I have configured snapshot_interval=10 in /etc/vitrage/vitrage.conf The alarm disapears after a while
fragment from /etc/vitrage/vitrage.conf
***************
[datasources]
types = nova.host,nova.instance,nova.zone,cinder.volume,neutron.network,neutron.port,doctor
snapshots_interval=10
***************
On the other hand if I comment it out the alarm persists
**************
[datasources]
types = nova.host,nova.instance,nova.zone,cinder.volume,neutron.network,neutron.port,doctor
#snapshots_interval=10
**************

I am interested if this behavior is correct or is this a bug.
My intention is to create some sort of hybrid datasource starting from the doctor one, that receives events for raising alarms like compute.host.down
but uses polling to clear them.

Best Regards,
Paul Vaduva
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20180207/a0b65d54/attachment.html>


More information about the OpenStack-dev mailing list