[openstack-dev] [vitrage] Feedback on ability to 'suppress' alarms by type and/or resource in Vitrage
Afek, Ifat (Nokia - IL/Kfar Sava)
ifat.afek at nokia.com
Mon Dec 4 16:26:09 UTC 2017
Ok, makes sense.
Regarding the vitrage_alarm_type: it will be usable only if there are a few alarm types that are used by most datasources. Otherwise it might be just as verbose as the name, IMO.
Anyway, you are welcome to propose a blueprint so we can discuss all details there.
Best Regards,
Ifat.
From: "Waines, Greg" <Greg.Waines at windriver.com>
Reply-To: "OpenStack Development Mailing List (not for usage questions)" <openstack-dev at lists.openstack.org>
Date: Monday, 4 December 2017 at 17:23
To: "OpenStack Development Mailing List (not for usage questions)" <openstack-dev at lists.openstack.org>
Subject: Re: [openstack-dev] [vitrage] Feedback on ability to 'suppress' alarms by type and/or resource in Vitrage
I am thinking that alarm suppression would be per-tenant.
Yeah i am liking the second suggestion, as well, wrt specification of suppressed alarms to be based on { vitrage_type & ‘regexp’ }.
Only other reason for introducing vitrage_alarm_type property is perhaps a ‘usability’ type reason
i.e. it provides perhaps an easier / quicker indication of the general type of alarm (e.g. port-failure, host-failure, ntp-server-down, remote-log-server-down, ...) when scanning an alarm list, rather than having to read the sometimes verbose ‘name’ (description) field of the alarm.
But i realize it would be difficult to enforce / manage the usage of this field.
This is a lower priority item for me.
Greg.
From: "Afek, Ifat (Nokia - IL/Kfar Sava)" <ifat.afek at nokia.com>
Reply-To: "openstack-dev at lists.openstack.org" <openstack-dev at lists.openstack.org>
Date: Monday, December 4, 2017 at 9:32 AM
To: "openstack-dev at lists.openstack.org" <openstack-dev at lists.openstack.org>
Subject: Re: [openstack-dev] [vitrage] Feedback on ability to 'suppress' alarms by type and/or resource in Vitrage
Hi Greg,
First, I think that supporting alarm suppression in Vitrage is a very good idea.
One question that I have is: I understand that you plan to support it both in the UI and in the CLI. Do you want to the suppression to be per-user? per-tenant? global?
Regarding adding vitrage_alarm_type, my main concern is how the different datasources will fill this information. A monitor like Zabbix can have a lot of different alarms, and we will have to find a way to map them to the different alarm types. Aodh could also have its own alarm types, etc. I believe that some monitors will not use this property at all, which will cause:
· No way to suppress some of the alarms by vitrage_alarm_type
· Empty column in Vitrage alarms list
I think that your second suggestion, of the vitrage_type and regex, could work better. Is there any other reason to add the vitrage_alarm_type property, other than for suppression purposes?
Best Regards,
Ifat.
From: "Waines, Greg" <Greg.Waines at windriver.com>
Reply-To: "OpenStack Development Mailing List (not for usage questions)" <openstack-dev at lists.openstack.org>
Date: Monday, 4 December 2017 at 15:34
To: "OpenStack Development Mailing List (not for usage questions)" <openstack-dev at lists.openstack.org>
Subject: Re: [openstack-dev] [vitrage] Feedback on ability to 'suppress' alarms by type and/or resource in Vitrage
Thinking about this more ...
· Any thoughts on adding a ‘vitrage_alarm_type (enum or short string)’ as a mechanism to identify the general type of problem or alarm being reported in order to address this ?
o could be an optional field
o but we’d display in the alarm list
o and we’d use it as the mechanism to suppress alarms by ‘type’
Other option:
· wrt specifying which alarms to suppress,
o could use combination of
§ ‘vitrage_type (enum)’ field - e.g. collectd, nagios, zabbix, vitrage, ...
§ and
§ a regexp on the ‘name (string)’ field
Thoughts ?
Greg.
From: Greg Waines <Greg.Waines at windriver.com>
Reply-To: "openstack-dev at lists.openstack.org" <openstack-dev at lists.openstack.org>
Date: Friday, December 1, 2017 at 8:45 AM
To: "openstack-dev at lists.openstack.org" <openstack-dev at lists.openstack.org>
Subject: [openstack-dev] [vitrage] Feedback on ability to 'suppress' alarms by type and/or resource in Vitrage
Hey,
I am interested in getting some feedback on a proposed blueprint for Vitrage.
BLUEPRINT:
TITLE: Add the ability to ‘suppress’ alarms by Alarm Type and/or Resource
When managing a cloud, there are situations where a particular alarm or a set of alarms from a particular resource are occurring frequently, however they are identifying issues that are not of concern, at least for the time being. For example, new hardware is in the process of being installed and resulting in alarms to occur, or remote servers (e.g. NTP Servers) are unreliable and result in frequent connectivity alarms. In these situations, these irrelevant alarms are cluttering the alarm displays and it would be helpful to be able to suppress these alarms.
Suppressed alarms would not be shown in Active Alarm lists or Historical Alarm lists, and would not be included in alarm counts.
There would be a CLI Option / Horizon Button, to enable looking at Alarms that are currently suppressed.
( i.e. the idea would be that suppressed alarms would still be tracked, they just would not be displayed by default)
Thoughts on usefulness ?
Questions on how to implement this in Vitrage
· from an end user’s point of view, alarms have the following fields
o vitrage_id (uuid) - unique identifier of an instance of an alarm
o vitrage_type (enum) - e.g. collectd, nagios, zabbix, vitrage, ...
- really an identifier of the general entity reporting the alarm
o name (string) - the alarm description
o vitrage_resource_type (enum) - e.g. nova.instance, nova.host, port, ...
o vitrage_resource_id (uuid) - resource instance
o vitrage_aggregated_severity
o vitrage_operational_severity
o update_timestamp
·
· there definitely is a specific resource identifier in order to be able to suppress alarms from a particular resource
·
· BUT there doesn’t seem like there is a general alarm type field
i.e. that would classify the type of problem that’s occurring
e.g.
o communication failure with compute host
o loss-of-signal on port of compute host
o loss of connectivity with NTP Server
o CPU Threshold exceeded on compute host
o Memory Threshold exceeded on compute host
o File System Threshold exceeded on compute host
o etc.
· ... which would be type/granularity of ‘Alarm Type’ that i would think the user would want to suppress alarms based on.
· i.e. it seems like the ‘name’ field is a combination of this general Alarm Type and details on the particular alarm.
·
· Any thoughts on adding a ‘vitrage_alarm_type (enum or short string)’ as a mechanism to identify the general type of problem or alarm being reported in order to address this ?
o could be an optional field
o but we’d display in the alarm list
o and we’d use it as the mechanism to suppress alarms by ‘type’
Let me know what you think ?
Greg.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20171204/4a2974e9/attachment.html>
More information about the OpenStack-dev
mailing list