[openstack-dev] [vitrage] I have some problems with Prometheus alarms in vitrage.

Ifat Afek ifatafekn at gmail.com
Wed Oct 10 14:52:08 UTC 2018


Hi Won,

On Wed, Oct 10, 2018 at 11:58 AM Won <wjstk16 at gmail.com> wrote:

>
> my prometheus version : 2.3.2 and alertmanager version : 0.15.2 and I
> attached files.(vitrage collector,graph logs and apache log and
> prometheus.yml alertmanager.yml alarm rule file etc..)
> I think the problem that resolved alarm does not disappear is the time
> stamp problem of the alarm.
>
> -gray alarm info
> severity:PAGE
> vitrage id: c6a94386-3879-499e-9da0-2a5b9d3294b8  ,
> e2c5eae9-dba9-4f64-960b-b964f1c01dfe , 3d3c903e-fe09-4a6f-941f-1a2adb09feca
> , 8c6e7906-9e66-404f-967f-40037a6afc83 ,
> e291662b-115d-42b5-8863-da8243dd06b4 , 8abd2a2f-c830-453c-a9d0-55db2bf72d46
> ----------
>
> The alarms marked with the blue circle are already resolved. However, it
> does not disappear from the entity graph and alarm list.
> There were seven more gray alarms at the top screenshot in active alarms
> like entity graph. It disappeared by deleting gray alarms from the
> vitrage-alarms table in the DB or changing the end timestamp value to an
> earlier time than the current time.
>

I checked the files that you sent, and it appears that the connection
between Prometheus and Vitrage works well. I see in vitrage-graph log that
Prometheus notified both on alert firing and on alert resolved statuses.
I still don't understand why the alarms were not removed from Vitrage,
though. Can you please send me the output of 'vitrage topology show' CLI
command?
Also, did you happen to restart vitrage-graph or vitrage-collector during
your tests?


> At the log, it seems that the first problem is that the timestamp value
> from the vitrage comes to 2001-01-01, even though the starting value in the
> Prometheus alarm information has the correct value.
> When the alarm is solved, the end time stamp value is not updated so alarm
> does not disappear from the alarm list.
>

Can you please show me where you saw the 2001 timestamp? I didn't find it
in the log.


> The second problem is that even if the time stamp problem is solved, the
> entity graph problem will not be solved. Gray alarm information is not in
> the vitage-collector log but in the vitrage graph and apache log.
> I want to know how to forcefully delete entity from a vitage graph.
>

You shouldn't do it :-) there is no API for deleting entities, and messing
with the database may cause unexpected results.
The only thing that you can safely do is to stop all Vitrage services,
execute 'vitrage-purge-data' command, and start the services again. This
will cause rebuilding of the entity graph.


> Regarding the multi nodes, I mean, 1 controll node(pc1) & 1 compute
> node(pc2). So one openstack.
>
> The test VM in the picture is an instance on compute node that has already
> been deleted. I waited for hours and checked nova.conf but it was not
> removed.
> This was not the occur in the queens version; in the rocky version,
> multinode environment, there seems to be a bug in VM creation on multi node.
> The same situation occurred in multi-node environments that were
> configured with different PCs.
>

Let me make sure I understand the problem.
When you create a new vm in Nova, does it immediately appear in the entity
graph?
When you delete a vm, it remains? does it remain in a multi-node
environment and deleted in a single node environment?

Br,
Ifat
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20181010/4aaf4824/attachment.html>


More information about the OpenStack-dev mailing list