[openstack-dev] [vitrage] Vitrage alarm processing behavior

Paul Vaduva Paul.Vaduva at enea.com
Wed Feb 21 14:30:50 UTC 2018


I attached also the driver.py that I am using.

From: Paul Vaduva [mailto:Paul.Vaduva at enea.com]
Sent: Wednesday, February 21, 2018 3:22 PM
To: OpenStack Development Mailing List (not for usage questions) <openstack-dev at lists.openstack.org>
Cc: Ciprian Barbu <Ciprian.Barbu at enea.com>
Subject: [Attachment removed] Re: [openstack-dev] [vitrage] Vitrage alarm processing behavior

Hi Ifat,

Sorry for the late reply.
To answer your questions
I started as an example from the doctor datasource (or a porting of it for the 1.3.0 version of vitrage) but will call it something different so no need to worry about conflicting with present doctor datasource.
I added polling alarms to it but I have a more particular use case:
* I get compute host down alarm on event
* I can't get host up event or it's an intricate sollution to implement

I tried to see if I can make the following scenario work:
Let's call Scenario I
* Get a compute host down event (Raisng an alarm)
* Periodically poll for the status of the compute in method "def _get_alarms(self):" of the Driver object
Both type of Interactions seem to work (polling and event based).
However now comes the tricky part. I would need for the alarms (with status up / compute host up) returned by method "def _get_alarms(self):" of this Driver object to cancel/clear the compute host down alarms raised by event. This unfortunatelly does not happen.

Oddely enough there is a mimic of this scenario that works but is not robust enough for out needs.
Let's call Scenario II:
* Gettting an event with compute host down(when one of our compute actually goes down)
* Polling alarm (also compute host down) is raised and somehow overwrites the event based one (I can see the updated time).
* After a while the actual compute reboots and polling for the alarms returns an alarm with status up that in this case clears the previous (I assume polling type now) alarm.

Now I can't understand why this second scenario works and the first one does not.
It seems as the same alarm type (compute host down with status down) obtained by polling can overwrite an identical type and status alarm raised by event, but An alarm with an updated status (i. e. up) got by polling mode cannot overwrite / clear and alarm with status down got by an event.
I am wondering if there is a reason of this behavior and if there is a way to modify it or is it a bug.

For the event's generation I use modified version of zabbix_vitrage.py script that publishes to rabbitmq
vitrage_notifications.info queue. I have attached this python script.
The code is still experimental But I wanted to know if it's logically posible to create The scenario we need, Scenario I.

Best Regards
Paul

From: Afek, Ifat (Nokia - IL/Kfar Sava) [mailto:ifat.afek at nokia.com]
Sent: Wednesday, February 7, 2018 7:16 PM
To: OpenStack Development Mailing List (not for usage questions) <openstack-dev at lists.openstack.org<mailto:openstack-dev at lists.openstack.org>>
Cc: Ciprian Barbu <Ciprian.Barbu at enea.com<mailto:Ciprian.Barbu at enea.com>>
Subject: Re: [openstack-dev] [vitrage] Vitrage alarm processing behavior

Hi Paul,

I’m glad that my fix helped.

Regarding the Doctor datasource: the purpose of this datasource was to be used by the Doctor test scripts. Do you intend to modify it, or to create a new similar datasource that also supports polling? Modifying the existing datasource could be problematic, since we need to make sure the existing functionality and tests stay the same.

In general, most of our datasources support both polling and notifications. A simple example is the Cinder datasource [1]. For example of an alarm datasource, you can look at Zabbix datasource [2]. You can also go over the documentation of how to add a new datasource [3].

As for your question, it is the responsibility of the datasource to clear the alarms that it created. For the Doctor datasource, you can send an event with “status”:”up” in the details and the datasource will clear the alarm.

[1] https://github.com/openstack/vitrage/tree/master/vitrage/datasources/cinder/volume<https://url10.mailanyone.net/v1/?m=1ejTL3-0003ZV-4n&i=57e1b682&c=Pe0SmnJrux3qg2aeVKwciP-we0PY0bk3JoTO_20fQHQ70cIoAgpMPXrk8JuN_BWqpqnpygQerGyzW2Snm5KfUQ7Y-INhOKG5eybo-thEBodvAhGSFpyXWQxPXS0Auc9aF0vGy2Ea4hrWfL6eeD0bQycBJN8lTLZnuIQx59ZeULyqstlxVBL34dcnQOFQf-5nS76n_X9owe_iNZrV57fmTrGKDogeMocpOJwlz9vnzzCDaL7RjjqCRLcbAxwkyRas3lujR6oZKt9NK1NBb-hb3uc721qSI6SR8SVN6zZGjQE>
[2] https://github.com/openstack/vitrage/tree/master/vitrage/datasources/zabbix<https://url10.mailanyone.net/v1/?m=1ejTL3-0003ZV-4n&i=57e1b682&c=uGgIuECLH17WmCqispfyornk-y9i4E2eyyvxC5fH2sepif7vNt0e_Op9ifHIcOuZLWy4fzJMsbItzfWpk5qNeYW2O3iEr5sPuXnguxKSRm6yrD12oGtjjJibDR7oVJnkQSNtu5caCM1BoguJiXBL7WisodfHGVdbYJDe2W2m11dc3ZmARXYI1FlmVWOPQiAGlzNtUgcQ_wpYwHtTJJaur8wiS415nr2oRHwU4C9hawW9HWktVVEH877WI_P1xf3VI1PjGVf75imEW-bHo3lAtCIAv4hWKcrxtHdL48oP7kQ>
[3] https://docs.openstack.org/vitrage/latest/contributor/add-new-datasource.html<https://url10.mailanyone.net/v1/?m=1ejTL3-0003ZV-4n&i=57e1b682&c=A08vm8gwOUlRCFuV_ZDNRKrFdo7lGQmqtrZE-ZXEB6yLzcanUHFW1Aue5PnhXvrALgd0apyK5SAU9-PPc5Pqi5uod_I2JAHONug3ILQ9e3RvoKWyoYcuehJzRa3bqH3g_r5GQnKIRRNnYccSg6T4wkA-Wl6PHZ7KXq7cYp9zY7Fhz2jCK_zTUNBGJvLR2W_bqwPdTe2iyetPXPa0N_JrF38KrkUOVppDYgfi4_onM9N6QUUEECArxlYPl-T3xDM5cMSrTf9iE38OJrg_nKG8Fkwr7rAV5L8tAEZ5vGMDQxc>


Best Regards,
Ifat.


From: Paul Vaduva <Paul.Vaduva at enea.com<mailto:Paul.Vaduva at enea.com>>
Reply-To: "OpenStack Development Mailing List (not for usage questions)" <openstack-dev at lists.openstack.org<mailto:openstack-dev at lists.openstack.org>>
Date: Wednesday, 7 February 2018 at 15:50
To: "OpenStack Development Mailing List (not for usage questions)" <openstack-dev at lists.openstack.org<mailto:openstack-dev at lists.openstack.org>>
Cc: Ciprian Barbu <Ciprian.Barbu at enea.com<mailto:Ciprian.Barbu at enea.com>>
Subject: Re: [openstack-dev] [vitrage] Vitrage alarm processing behavior

Hi Ifat,

Yes I’ve checked the 1.3.1 refers to a deb package (python-vitrage) version built by us, so the git tag used to build that deb is 1.3.0.
But I also backported doctor datasource from vitreage git master branch.

I also noticed that when I configure snapshots_interval=10 I also get this exception in
/var/log/vitrage/graph.log around the time the alarms disapear.
https://hastebin.com/ukisajojef.sql<https://url10.mailanyone.net/v1/?m=1ejTL3-0003ZV-4n&i=57e1b682&c=dIFoa_mHWzOpmJ9KV346afu6D9E3lEuyvUD6vwgvXW-hvbG45rR_s7mUjXnZgBFfnmwyP_2yo8TbtBKzX2-NatWbW9ZEbu-UWM9KzGIZ_t9Gd3XlOHgTkzVFIp7EKiMUPgii_AeCSLmrEla5h92sjdmi1Ki6H8V3qOQJ962FXtp5IUPKhIMtDvv8gJSMUeHWOXbhuK21K9PfeHmcf-1-Zpy7sWFV2FP9qVAn5jO9Wm0>

I've cherry picked your before mentioned change and the alarm that came from event is now persistent and the exception is gone.
So it was a bug.
I understand that for doctor datasources I need to have events for raising the alarm and also for clearing it is that correct?


Best Regards,
Paul

From: Afek, Ifat (Nokia - IL/Kfar Sava) [mailto:ifat.afek at nokia.com]
Sent: Wednesday, February 7, 2018 1:24 PM
To: OpenStack Development Mailing List (not for usage questions) <openstack-dev at lists.openstack.org<mailto:openstack-dev at lists.openstack.org>>
Subject: Re: [openstack-dev] [vitrage] Vitrage alarm processing behavior

Hi Paul,

It sounds like a bug. Alarms created by a datasource are not supposed to be deleted later on. It might be a bug that was fixed in Queens [1].

I’m not sure which Vitrage version you are actually using. I failed to find a vitrage version 1.3.1. Could it be that you are referring to a version of python-vitrageclient or vitrage-dashboard?

In any case, if you are using an older version, I suggest that you try to use the fix that I mentioned [1] and see if it helps.


[1] https://review.openstack.org/#/c/524228<https://url10.mailanyone.net/v1/?m=1ejNt4-0001fR-4I&i=57e1b682&c=LqJB68i5VuuaUnZ6iOIMHVhcsHMatfhcTwtLpAT-Rn5UZ3qnX4tq4XOTjYR1XqQIDRQGrqGMwZI31cnT-bEHTFX95wRD-iENXse8JBDHIyv8iJUD7RiwDp74HqNHBFZ-BybLQgQ6-sVcf62n2ogMk31b-Sp0xUJZXxH_0q2Iu-4Hodt4gxhKuFMTT2breh42c7OT5kdHzPJThKClzSEBQ2NWkNTCy112gxlapjMCVxSNQ9nsLg4f0XyJaAVUnAHO>


Best Regards,
Ifat.


From: Paul Vaduva <Paul.Vaduva at enea.com<mailto:Paul.Vaduva at enea.com>>
Reply-To: "OpenStack Development Mailing List (not for usage questions)" <openstack-dev at lists.openstack.org<mailto:openstack-dev at lists.openstack.org>>
Date: Wednesday, 7 February 2018 at 11:58
To: "openstack-dev at lists.openstack.org<mailto:openstack-dev at lists.openstack.org>" <openstack-dev at lists.openstack.org<mailto:openstack-dev at lists.openstack.org>>
Subject: [openstack-dev] [vitrage] Vitrage alarm processing behavior

Hi Vitrage developers,

I have a question about vitrage innerworkings, I ported doctor datasource from master branch to an earlier version of vitrage (1.3.1).
I noticed some behavior I am wondering if it's ok or it is bug of some sort.
Here it is:
1. I am sending some event for rasing an alarm to doctor datasource of vitrage.
2. I am receiving the event hence the alarm is displayed on vitrage dashboard attached to the affected resource (as expected)
3. If I have configured snapshot_interval=10 in /etc/vitrage/vitrage.conf The alarm disapears after a while
fragment from /etc/vitrage/vitrage.conf
***************
[datasources]
types = nova.host,nova.instance,nova.zone,cinder.volume,neutron.network,neutron.port,doctor
snapshots_interval=10
***************
On the other hand if I comment it out the alarm persists
**************
[datasources]
types = nova.host,nova.instance,nova.zone,cinder.volume,neutron.network,neutron.port,doctor
#snapshots_interval=10
**************

I am interested if this behavior is correct or is this a bug.
My intention is to create some sort of hybrid datasource starting from the doctor one, that receives events for raising alarms like compute.host.down
but uses polling to clear them.

Best Regards,
Paul Vaduva
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20180221/822190d8/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: driver.py
Type: application/octet-stream
Size: 5846 bytes
Desc: driver.py
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20180221/822190d8/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: doctor_vitrage.py
Type: application/octet-stream
Size: 3613 bytes
Desc: doctor_vitrage.py
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20180221/822190d8/attachment-0001.obj>


More information about the OpenStack-dev mailing list