Re: [openstack-dev] [vitrage] I have some problems with Prometheus alarms in vitrage.
Hi, I attached four log files. I collected the logs from about 17:14 to 17:42. I created an instance of 'deltesting3' at 17:17. 7minutes later, at 17:24, the entity graph showed the dentesting3 and vitrage colletor and graph logs are appeared. When creating an instance in ubuntu server, it appears immediately in the entity graph and logs, but when creating an instance in computer1 (multi node), it appears about 5~10 minutes later. I deleted an instance of 'deltesting3' around 17:26.
After ~20minutes, there was only Apigateway. Does it make sense? did you delete the instances on ubuntu, in addition to deltesting?
I only deleted 'deltesting'. After that, only the logs from 'apigateway' and 'kube-master' were collected. But other instances were working well. I don't know why only two instances are collected in the log. NOV 19 In this log, 'agigateway' and 'kube-master' were continuously collected in a short period of time, but other instances were sometimes collected in long periods. In any case, I would expect to see the instances deleted from the graph at
this stage, since they were not returned by get_all. Can you please send me the log of vitrage-graph at the same time (Nov 15, 16:35-17:10)?
Information 'deldtesting3' that has already been deleted continues to be collected in vitrage-graph.service. Br, Won 2018년 11월 15일 (목) 오후 10:13, Ifat Afek <ifatafekn@gmail.com>님이 작성:
On Thu, Nov 15, 2018 at 10:28 AM Won <wjstk16@gmail.com> wrote:
Looking at the logs, I see two issues:
1. On ubuntu server, you get a notification about the vm deletion, while on compute1 you don't get it. Please make sure that Nova sends notifications to 'vitrage_notifications' - it should be configured in /etc/nova/nova.conf. 2. Once in 10 minutes (by default) nova.instance datasource queries all instances. The deleted vm is supposed to be deleted in Vitrage at this stage, even if the notification was lost. Please check in your collector log for the a message of "novaclient.v2.client [-] RESP BODY" before and after the deletion, and send me its content.
I attached two log files. I created a VM in computer1 which is a computer node and deleted it a few minutes later. Log for 30 minutes from VM creation. The first is the log of the vitrage-collect that grep instance name. The second is the noovaclient.v2.clinet [-] RESP BODY log. After I deleted the VM, no log of the instance appeared in the collector log no matter how long I waited.
I added the following to Nova.conf on the computer1 node.(attached file 'compute_node_local_conf.txt') notification_topics = notifications,vitrage_notifications notification_driver = messagingv2 vif_plugging_timeout = 300 notify_on_state_change = vm_and_task_state instance_usage_audit_period = hour instance_usage_audit = True
Hi,
From the collector log RESP BODY messages I understand that in the beginning there were the following servers: compute1: deltesting ubuntu: Apigateway, KubeMaster and others
After ~20minutes, there was only Apigateway. Does it make sense? did you delete the instances on ubuntu, in addition to deltesting? In any case, I would expect to see the instances deleted from the graph at this stage, since they were not returned by get_all. Can you please send me the log of vitrage-graph at the same time (Nov 15, 16:35-17:10)?
There is still the question of why we don't see a notification from Nova, but let's try to solve the issues one by one.
Thanks, Ifat
__________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
__________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Hi, A deleted instance should be removed from Vitrage in one of two ways: 1. By reacting to a notification from Nova 2. If no notification is received, then after a while the instance vertex in Vitrage is considered "outdated" and is deleted Regarding #1, it is clear from your logs that you don't get notifications from Nova on the second compute. Do you have on one of your nodes, in addition to nova.conf, also a nova-cpu.conf? if so, please make the same change in this file: notification_topics = notifications,vitrage_notifications notification_driver = messagingv2 And please make sure to restart nova compute service on that node. Regarding #2, as a second-best solution, the instances should be deleted from the graph after not being updated for a while. I realized that we have a bug in this area and I will push a fix to gerrit later today. In the meantime, you can add to InstanceDriver class the following function: @staticmethod def should_delete_outdated_entities(): return True Let me know if it solved your problem, Ifat On Wed, Nov 21, 2018 at 1:50 PM Won <wjstk16@gmail.com> wrote:
I attached four log files. I collected the logs from about 17:14 to 17:42. I created an instance of 'deltesting3' at 17:17. 7minutes later, at 17:24, the entity graph showed the dentesting3 and vitrage colletor and graph logs are appeared. When creating an instance in ubuntu server, it appears immediately in the entity graph and logs, but when creating an instance in computer1 (multi node), it appears about 5~10 minutes later. I deleted an instance of 'deltesting3' around 17:26.
After ~20minutes, there was only Apigateway. Does it make sense? did you delete the instances on ubuntu, in addition to deltesting?
I only deleted 'deltesting'. After that, only the logs from 'apigateway' and 'kube-master' were collected. But other instances were working well. I don't know why only two instances are collected in the log. NOV 19 In this log, 'agigateway' and 'kube-master' were continuously collected in a short period of time, but other instances were sometimes collected in long periods.
In any case, I would expect to see the instances deleted from the graph at
this stage, since they were not returned by get_all. Can you please send me the log of vitrage-graph at the same time (Nov 15, 16:35-17:10)?
Information 'deldtesting3' that has already been deleted continues to be collected in vitrage-graph.service.
__________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Hi, I checked that both of the methods you propose work well. After I add 'should_delete_outdated_entities' function to InstanceDriver, it took about 10 minutes to clear the old Instance. And I added two sentences you said to Nova-cpu.conf, so the vitrage collector get notifications well. Thank you for your help. Best regards, Won 2018년 11월 22일 (목) 오후 9:35, Ifat Afek <ifatafekn@gmail.com>님이 작성:
Hi,
A deleted instance should be removed from Vitrage in one of two ways: 1. By reacting to a notification from Nova 2. If no notification is received, then after a while the instance vertex in Vitrage is considered "outdated" and is deleted
Regarding #1, it is clear from your logs that you don't get notifications from Nova on the second compute. Do you have on one of your nodes, in addition to nova.conf, also a nova-cpu.conf? if so, please make the same change in this file:
notification_topics = notifications,vitrage_notifications
notification_driver = messagingv2
And please make sure to restart nova compute service on that node.
Regarding #2, as a second-best solution, the instances should be deleted from the graph after not being updated for a while. I realized that we have a bug in this area and I will push a fix to gerrit later today. In the meantime, you can add to InstanceDriver class the following function:
@staticmethod def should_delete_outdated_entities(): return True
Let me know if it solved your problem, Ifat
On Wed, Nov 21, 2018 at 1:50 PM Won <wjstk16@gmail.com> wrote:
I attached four log files. I collected the logs from about 17:14 to 17:42. I created an instance of 'deltesting3' at 17:17. 7minutes later, at 17:24, the entity graph showed the dentesting3 and vitrage colletor and graph logs are appeared. When creating an instance in ubuntu server, it appears immediately in the entity graph and logs, but when creating an instance in computer1 (multi node), it appears about 5~10 minutes later. I deleted an instance of 'deltesting3' around 17:26.
After ~20minutes, there was only Apigateway. Does it make sense? did you delete the instances on ubuntu, in addition to deltesting?
I only deleted 'deltesting'. After that, only the logs from 'apigateway' and 'kube-master' were collected. But other instances were working well. I don't know why only two instances are collected in the log. NOV 19 In this log, 'agigateway' and 'kube-master' were continuously collected in a short period of time, but other instances were sometimes collected in long periods.
In any case, I would expect to see the instances deleted from the graph
at this stage, since they were not returned by get_all. Can you please send me the log of vitrage-graph at the same time (Nov 15, 16:35-17:10)?
Information 'deldtesting3' that has already been deleted continues to be collected in vitrage-graph.service.
__________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
__________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
participants (2)
-
Ifat Afek
-
Won