[openstack-dev] [vitrage] I have some problems with Prometheus alarms in vitrage.

Won wjstk16 at gmail.com
Wed Oct 10 08:58:03 UTC 2018


Hi. I'm sorry for the late reply.

my prometheus version : 2.3.2 and alertmanager version : 0.15.2 and I
attached files.(vitrage collector,graph logs and apache log and
prometheus.yml alertmanager.yml alarm rule file etc..)
I think the problem that resolved alarm does not disappear is the time
stamp problem of the alarm.
[image: alarm list.JPG]
[image: vitrage-entity_graph.jpg]
-gray alarm info
severity:PAGE
vitrage id: c6a94386-3879-499e-9da0-2a5b9d3294b8  ,
e2c5eae9-dba9-4f64-960b-b964f1c01dfe , 3d3c903e-fe09-4a6f-941f-1a2adb09feca
, 8c6e7906-9e66-404f-967f-40037a6afc83 ,
e291662b-115d-42b5-8863-da8243dd06b4 , 8abd2a2f-c830-453c-a9d0-55db2bf72d46
----------

The alarms marked with the blue circle are already resolved. However, it
does not disappear from the entity graph and alarm list.
There were seven more gray alarms at the top screenshot in active alarms
like entity graph. It disappeared by deleting gray alarms from the
vitrage-alarms table in the DB or changing the end timestamp value to an
earlier time than the current time.

At the log, it seems that the first problem is that the timestamp value
from the vitrage comes to 2001-01-01, even though the starting value in the
Prometheus alarm information has the correct value.
When the alarm is solved, the end time stamp value is not updated so alarm
does not disappear from the alarm list.

The second problem is that even if the time stamp problem is solved, the
entity graph problem will not be solved. Gray alarm information is not in
the vitage-collector log but in the vitrage graph and apache log.
I want to know how to forcefully delete entity from a vitage graph.


Regarding the multi nodes, I mean, 1 controll node(pc1) & 1 compute
node(pc2). So one openstack.
[image: image.png]
The test VM in the picture is an instance on compute node that has already
been deleted. I waited for hours and checked nova.conf but it was not
removed.
This was not the occur in the queens version; in the rocky version,
multinode environment, there seems to be a bug in VM creation on multi node.
The same situation occurred in multi-node environments that were configured
with different PCs.

thanks,
Won











2018년 10월 4일 (목) 오후 10:46, Ifat Afek <ifatafekn at gmail.com>님이 작성:

> Hi,
>
> Can you please give us some more details about your scenario with
> Prometheus? Please try and give as many details as possible, so we can try
> to reproduce the bug.
>
>
> What do you mean by “if the alarm is resolved, the alarm manager makes a
> silence, or removes the alarm rule from Prometheus”? these are different
> cases. None of them works in your environment?
>
> Which Prometheus and Alertmanager versions are you using?
>
>  Please try to change the Vitrage loglevel to DEBUG (set “debug = true”
> in /etc/vitrage/vitrage.conf) and send me the Vitrage collector, graph and
> api logs.
>
> Regarding the multi nodes, I'm not sure I understand your configuration.
> Do you mean there is more than one OpenStack and Nova? more than one host?
> more than one vm?
>
> Basically, vms are deleted from Vitrage in two cases:
> 1. After each periodic call to get_all of nova.instance datasource. By
> default this is done once in 10 minutes.
> 2. Immediately, if you have the following configuration in
> /etc/nova/nova.conf:
> notification_topics = notifications,vitrage_notifications
>
> So, please check your nova.conf and also whether the vms are deleted after
> 10 minutes.
>
> Thanks,
> Ifat
>
>
> On Thu, Oct 4, 2018 at 7:12 AM Won <wjstk16 at gmail.com> wrote:
>
>> Thank you for your reply Ifat.
>>
>> The alertmanager.yml file already contains 'send_resolved:true'.
>> However, the alarm does not disappear from the alarm list and the entity
>> graph even if the alarm is resolved, the alarm manager makes a silence, or
>> removes the alarm rule from Prometheus.
>> The only way to remove alarms is to manually remove them from the db. Is
>> there any other way to remove the alarm?
>> Entities(vm) that run on multi nodes in the rocky version have similar
>> symptoms. There was a symptom that the Entities created on the multi-node
>> would not disappear from the Entity Graph even after deletion.
>> Is this a bug in rocky version?
>>
>> Best Regards,
>> Won
>>
>> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20181010/bc529e2d/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: alarm list.JPG
Type: image/jpeg
Size: 57609 bytes
Desc: not available
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20181010/bc529e2d/attachment-0001.jpe>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: vitrage-entity_graph.jpg
Type: image/jpeg
Size: 74797 bytes
Desc: not available
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20181010/bc529e2d/attachment-0001.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 35018 bytes
Desc: not available
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20181010/bc529e2d/attachment-0001.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: environment.zip
Type: application/x-zip-compressed
Size: 232194 bytes
Desc: not available
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20181010/bc529e2d/attachment-0001.bin>


More information about the OpenStack-dev mailing list