[openstack-dev] [vitrage] Collectd - to - Vitrage setup issues

Waines, Greg Greg.Waines at windriver.com
Wed Sep 6 11:16:07 UTC 2017


Log is attached ... took a quick look ... it looks like clear notification is received and attempt is made to delete entity but doesn’t find it ?
Greg.

console log:

stack at devstack-vitrage:~$
stack at devstack-vitrage:~$ date
Wed Sep  6 11:04:29 UTC 2017
stack at devstack-vitrage:~$ cd devstack/
stack at devstack-vitrage:~/devstack$ source openrc admin admin
WARNING: setting legacy OS_TENANT_NAME to support cli tools.
stack at devstack-vitrage:~/devstack$ vitrage alarm list

stack at devstack-vitrage:~/devstack$ sudo taskset 0x01 yes > /dev/null &
[1] 4100
stack at devstack-vitrage:~/devstack$ jobs
[1]+  Running                 sudo taskset 0x01 yes > /dev/null &
stack at devstack-vitrage:~/devstack$
stack at devstack-vitrage:~/devstack$ vitrage alarm list

stack at devstack-vitrage:~/devstack$ vitrage alarm list

stack at devstack-vitrage:~/devstack$ vitrage alarm list
+------------+--------------+----------+-----------------------+---------------------+-----------------------------+------------------------------+------------------+
| vitrage_id | vitrage_type | name     | vitrage_resource_type | vitrage_resource_id | vitrage_aggregated_severity | vitrage_operational_severity | update_timestamp |
+------------+--------------+----------+-----------------------+---------------------+-----------------------------+------------------------------+------------------+
| 8b0bb004-3 | collectd     | Host dev | nova.host             | a7bca8d3-c84f-46cd- | FAILURE                     | CRITICAL                     | 2017-09-06T11:05 |
| ae0-4ed8-a |              | stack-   |                       | a8db-cccab2851660   |                             |                              | :53Z             |
| 444-131baf |              | vitrage, |                       |                     |                             |                              |                  |
| d26031     |              | plugin   |                       |                     |                             |                              |                  |
|            |              | cpu (ins |                       |                     |                             |                              |                  |
|            |              | tance 0) |                       |                     |                             |                              |                  |
|            |              | type     |                       |                     |                             |                              |                  |
|            |              | percent  |                       |                     |                             |                              |                  |
|            |              | (instanc |                       |                     |                             |                              |                  |
|            |              | e idle): |                       |                     |                             |                              |                  |
|            |              | Data     |                       |                     |                             |                              |                  |
|            |              | source   |                       |                     |                             |                              |                  |
|            |              | "value"  |                       |                     |                             |                              |                  |
|            |              | is curre |                       |                     |                             |                              |                  |
|            |              | ntly 0.0 |                       |                     |                             |                              |                  |
|            |              | 00000.   |                       |                     |                             |                              |                  |
|            |              | That is  |                       |                     |                             |                              |                  |
|            |              | below    |                       |                     |                             |                              |                  |
|            |              | the      |                       |                     |                             |                              |                  |
|            |              | failure  |                       |                     |                             |                              |                  |
|            |              | threshol |                       |                     |                             |                              |                  |
|            |              | d of 20. |                       |                     |                             |                              |                  |
|            |              | 000000.  |                       |                     |                             |                              |                  |
+------------+--------------+----------+-----------------------+---------------------+-----------------------------+------------------------------+------------------+
stack at devstack-vitrage:~/devstack$ date
Wed Sep  6 11:06:30 UTC 2017
stack at devstack-vitrage:~/devstack$ jobs
[1]+  Running                 sudo taskset 0x01 yes > /dev/null &
stack at devstack-vitrage:~/devstack$ ps -ef | fgrep taskset
root      4100  3671  0 11:05 pts/0    00:00:00 sudo taskset 0x01 yes
stack     4605  3671  0 11:07 pts/0    00:00:00 grep -F --color=auto taskset
stack at devstack-vitrage:~/devstack$ sudo kill 4100
[1]+  Done                    sudo taskset 0x01 yes > /dev/null
stack at devstack-vitrage:~/devstack$
stack at devstack-vitrage:~/devstack$ date
Wed Sep  6 11:08:07 UTC 2017
stack at devstack-vitrage:~/devstack$ tail /tmp/python-notifications.dump

host: devstack-vitrage
plugin: cpu
plugin_instance: 0
type: percent
type_instance: idle
time: 1504696088.27
severity: 4
message: Host devstack-vitrage, plugin cpu (instance 0) type percent (instance idle): All data sources are within range again. Current value of "value" is 79.572764.

stack at devstack-vitrage:~/devstack$
stack at devstack-vitrage:~/devstack$ vitrage alarm list --max-width 80
+------------+--------------+----------+-----------------------+---------------------+-----------------------------+------------------------------+------------------+
| vitrage_id | vitrage_type | name     | vitrage_resource_type | vitrage_resource_id | vitrage_aggregated_severity | vitrage_operational_severity | update_timestamp |
+------------+--------------+----------+-----------------------+---------------------+-----------------------------+------------------------------+------------------+
| 8b0bb004-3 | collectd     | Host dev | nova.host             | a7bca8d3-c84f-46cd- | FAILURE                     | CRITICAL                     | 2017-09-06T11:05 |
| ae0-4ed8-a |              | stack-   |                       | a8db-cccab2851660   |                             |                              | :53Z             |
| 444-131baf |              | vitrage, |                       |                     |                             |                              |                  |
| d26031     |              | plugin   |                       |                     |                             |                              |                  |
|            |              | cpu (ins |                       |                     |                             |                              |                  |
|            |              | tance 0) |                       |                     |                             |                              |                  |
|            |              | type     |                       |                     |                             |                              |                  |
|            |              | percent  |                       |                     |                             |                              |                  |
|            |              | (instanc |                       |                     |                             |                              |                  |
|            |              | e idle): |                       |                     |                             |                              |                  |
|            |              | Data     |                       |                     |                             |                              |                  |
|            |              | source   |                       |                     |                             |                              |                  |
|            |              | "value"  |                       |                     |                             |                              |                  |
|            |              | is curre |                       |                     |                             |                              |                  |
|            |              | ntly 0.0 |                       |                     |                             |                              |                  |
|            |              | 00000.   |                       |                     |                             |                              |                  |
|            |              | That is  |                       |                     |                             |                              |                  |
|            |              | below    |                       |                     |                             |                              |                  |
|            |              | the      |                       |                     |                             |                              |                  |
|            |              | failure  |                       |                     |                             |                              |                  |
|            |              | threshol |                       |                     |                             |                              |                  |
|            |              | d of 20. |                       |                     |                             |                              |                  |
|            |              | 000000.  |                       |                     |                             |                              |                  |
+------------+--------------+----------+-----------------------+---------------------+-----------------------------+------------------------------+------------------+
stack at devstack-vitrage:~/devstack$



Greg.



From: "Afek, Ifat (Nokia - IL/Kfar Sava)" <ifat.afek at nokia.com>
Date: Wednesday, September 6, 2017 at 4:35 AM
To: Greg Waines <Greg.Waines at windriver.com>, "openstack-dev at lists.openstack.org" <openstack-dev at lists.openstack.org>
Cc: "TAHHAN, MARYAM" <maryam.tahhan at intel.com>
Subject: Re: [openstack-dev] [vitrage] Collectd - to - Vitrage setup issues

Hi Greg,

Can you please send me the vitrage-graph.log?

Thanks,
Ifat.

From: "Waines, Greg" <Greg.Waines at windriver.com>
Date: Wednesday, 6 September 2017 at 3:07
To: "Afek, Ifat (Nokia - IL/Kfar Sava)" <ifat.afek at nokia.com>, "OpenStack Development Mailing List (not for usage questions)" <openstack-dev at lists.openstack.org>
Cc: "TAHHAN, MARYAM" <maryam.tahhan at intel.com>
Subject: Re: [openstack-dev] [vitrage] Collectd - to - Vitrage setup issues

That worked :)
Thanks a lot Ifat.

I see the cpu alarm reported in vitrage now :).
( and in the /tmp/python-notifications.dump file that volodomyr helped me with )

Only final issue is that the vitrage alarm does not clear when I see the clear notification ( ? severity 4 ? )in /tmp/python-notifications.dump.
I waited about 10 mins ... in case there was some hysteresis or something.

I am using a late version of ocata.

in /tmp/python-notifications.dump:
host: devstack-vitrage
plugin: cpu
plugin_instance: 0
type: percent
type_instance: idle
time: 1504655183.27
severity: 1
message: Host devstack-vitrage, plugin cpu (instance 0) type percent (instance idle): Data source "value" is currently 0.000000. That is below the failure threshold of 20.000000.



host: devstack-vitrage
plugin: cpu
plugin_instance: 0
type: percent
type_instance: idle
time: 1504655228.27
severity: 4
message: Host devstack-vitrage, plugin cpu (instance 0) type percent (instance idle): All data sources are within range again. Current value of "value" is 96.306246.

// 20 minutes later ,,, still see
stack at devstack-vitrage:~/devstack$ vitrage alarm list --max-width 80
+------------+--------------+----------+-----------------------+---------------------+-----------------------------+------------------------------+------------------+
| vitrage_id | vitrage_type | name     | vitrage_resource_type | vitrage_resource_id | vitrage_aggregated_severity | vitrage_operational_severity | update_timestamp |
+------------+--------------+----------+-----------------------+---------------------+-----------------------------+------------------------------+------------------+
| af0cd49f-8 | collectd     | Host dev | nova.host             | a7bca8d3-c84f-46cd- | FAILURE                     | CRITICAL                     | 2017-09-05T23:46 |
| d02-4173-8 |              | stack-   |                       | a8db-cccab2851660   |                             |                              | :23Z             |
| b56-37e19e |              | vitrage, |                       |                     |                             |                              |                  |
| cf7dfb     |              | plugin   |                       |                     |                             |                              |                  |
|            |              | cpu (ins |                       |                     |                             |                              |                  |
|            |              | tance 0) |                       |                     |                             |                              |                  |
|            |              | type     |                       |                     |                             |                              |                  |
|            |              | percent  |                       |                     |                             |                              |                  |
|            |              | (instanc |                       |                     |                             |                              |                  |
|            |              | e idle): |                       |                     |                             |                              |                  |
|            |              | Data     |                       |                     |                             |                              |                  |
|            |              | source   |                       |                     |                             |                              |                  |
|            |              | "value"  |                       |                     |                             |                              |                  |
|            |              | is curre |                       |                     |                             |                              |                  |
|            |              | ntly 0.0 |                       |                     |                             |                              |                  |
|            |              | 00000.   |                       |                     |                             |                              |                  |
|            |              | That is  |                       |                     |                             |                              |                  |
|            |              | below    |                       |                     |                             |                              |                  |
|            |              | the      |                       |                     |                             |                              |                  |
|            |              | failure  |                       |                     |                             |                              |                  |
|            |              | threshol |                       |                     |                             |                              |                  |
|            |              | d of 20. |                       |                     |                             |                              |                  |
|            |              | 000000.  |                       |                     |                             |                              |                  |
+------------+--------------+----------+-----------------------+---------------------+-----------------------------+------------------------------+------------------+
stack at devstack-vitrage:~/devstack$


Greg.



From: "Afek, Ifat (Nokia - IL/Kfar Sava)" <ifat.afek at nokia.com>
Date: Tuesday, September 5, 2017 at 9:43 AM
To: Greg Waines <Greg.Waines at windriver.com>, "openstack-dev at lists.openstack.org" <openstack-dev at lists.openstack.org>
Cc: "TAHHAN, MARYAM" <maryam.tahhan at intel.com>
Subject: Re: [openstack-dev] [vitrage] Collectd - to - Vitrage setup issues

Hi Greg,

There is an exception in vitrage-graph, since the configuration for devstack-vitrage/cpu/0 was not found.

Please verify that your collectd_conf.yaml looks like that:

collectd:
- collectd_host: devstack-vitrage/cpu/0
   type: <openstack resource type, e.g. nova.host>
   name: <openstack resource name, e.g. host1>
- collectd_host: …

If this doesn’t help, please send me the file and I’ll have a look.

Best Regards,
Ifat.


From: "Waines, Greg" <Greg.Waines at windriver.com>
Date: Tuesday, 5 September 2017 at 15:38
To: "Afek, Ifat (Nokia - IL/Kfar Sava)" <ifat.afek at nokia.com>, "OpenStack Development Mailing List (not for usage questions)" <openstack-dev at lists.openstack.org>
Cc: "TAHHAN, MARYAM" <maryam.tahhan at intel.com>
Subject: Re: [openstack-dev] [vitrage] Collectd - to - Vitrage setup issues

Hi Ifat,

I was able to fix and verify that my collectd configuration is correct and working with the help of volodomyr ... i.e. I have a much simpler collectd.conf with a threshold set on cpu-0 idle percentage and a simple python plugin to dump notifications to a file.

I added in the vitrage collectd plugin to this simple setup ... but still don’t see vitrage alarms displayed on the vitrage dashboard ☹ .

I have attached the vitrage-graph.log
I have attached my now much simpler collectd.conf file
I have also attached the only templates I have defined under /etc/vitrage/templates  ... wondering if I need updated templates for working with collectd notifications ?

let me know if you have any ideas,
Greg.



From: "Afek, Ifat (Nokia - IL/Kfar Sava)" <ifat.afek at nokia.com>
Date: Sunday, September 3, 2017 at 3:20 AM
To: Greg Waines <Greg.Waines at windriver.com>, "openstack-dev at lists.openstack.org" <openstack-dev at lists.openstack.org>
Cc: "TAHHAN, MARYAM" <maryam.tahhan at intel.com>
Subject: Re: [openstack-dev] [vitrage] Collectd - to - Vitrage setup issues

Hi Greg,

You should access the vitrage-graph.log using journalctl:
sudo journalctl --no-pager --unit devstack at vitrage-graph.service<mailto:devstack at vitrage-graph.service>

Best Regards,
Ifat.


From: "Waines, Greg" <Greg.Waines at windriver.com>
Date: Thursday, 31 August 2017 at 20:10
To: "Afek, Ifat (Nokia - IL/Kfar Sava)" <ifat.afek at nokia.com>, "OpenStack Development Mailing List (not for usage questions)" <openstack-dev at lists.openstack.org>
Cc: "TAHHAN, MARYAM" <maryam.tahhan at intel.com>
Subject: Re: [openstack-dev] [vitrage] Collectd - to - Vitrage setup issues

Hi Ifat,
I actually have ‘debug = true’ in /etc/vitrage/vitrage.conf .
However I don’t see vitrage-graph.log anywhere ?
Where is it suppose to be ?   in /var/log/ ?
Greg.


root at devstack-vitrage:/# more /etc/vitrage/vitrage.conf



[DEFAULT]

debug = True

transport_url = rabbit://stackrabbit:admin@10.10.10.13:5672/



[oslo_policy]

policy_file = /etc/vitrage/policy.json



[service_credentials]

auth_url = http://10.10.10.13/identity

region_name = RegionOne

project_name = admin

password = admin

project_domain_id = default

user_domain_id = default

username = vitrage

auth_type = password



[datasources]

types = nova.host,nova.instance,nova.zone,static,static_physical,aodh,cinder.volume,neutron.network,neutron.port,doctor,collectd



[keystone_authtoken]

memcached_servers = 10.10.10.13:11211

signing_dir = /var/cache/vitrage

cafile = /opt/stack/data/ca-bundle.pem

project_domain_name = Default

project_name = admin

user_domain_name = Default

password = admin

username = vitrage

auth_url = http://10.10.10.13/identity

auth_type = password



[api]

pecan_debug = False



[collectd]

config_file = /etc/vitrage/collectd_conf.yaml



root at devstack-vitrage:/#


From: "Afek, Ifat (Nokia - IL/Kfar Sava)" <ifat.afek at nokia.com>
Date: Thursday, August 31, 2017 at 3:52 AM
To: Greg Waines <Greg.Waines at windriver.com>, "openstack-dev at lists.openstack.org" <openstack-dev at lists.openstack.org>
Cc: "TAHHAN, MARYAM" <maryam.tahhan at intel.com>
Subject: Re: [openstack-dev] [vitrage] Collectd - to - Vitrage setup issues

Hi Greg,

Vitrage listens to Collectd notifications, not statistics.
Can you please turn on the debug option in /etc/vitrage/vitrage.conf (set “debug = true”), and send me the vitrage-graph.log?

Thanks,
Ifat.

From: "Waines, Greg" <Greg.Waines at windriver.com>
Date: Wednesday, 30 August 2017 at 22:17
To: "OpenStack Development Mailing List (not for usage questions)" <openstack-dev at lists.openstack.org>
Cc: "Afek, Ifat (Nokia - IL/Kfar Sava)" <ifat.afek at nokia.com>, "TAHHAN, MARYAM" <maryam.tahhan at intel.com>
Subject: Re: [openstack-dev] [vitrage] Collectd - to - Vitrage setup issues

Hi Ifat,
thanks for the reply ... just got around to trying your suggestions.

This definitely helps ... I no longer get any errors on re-starting collectd or vitrage-graph.
i.e. it appears to load the collectd and updated vitrage conf files correctly now.

Now still don’t get any alarms in vitrage.
HOWEVER I suspect it may be my collectd setup now.
( WARNING I am NOT a collectd expert. ;) )

I suspect that the vitrage-collectd plugin only sends collectd NOTIFICATIONS or THRESHOLD Events to vitrage.
i.e. it likely does NOT send just statistic/status samples to vitrage.

I can see that collectd sampling is happening ... I have logfile and csv and rrd plugins running and samples are being captured in the specified directories / files.

I tried to set threshold for CPU based on an example I had found on web.
See attached collectd.conf file .

BUT really not sure if the threshold configuration in my collectd.conf is correct or working ... is there a way to confirm this ?   ( any collectd experts out there ? )
OR
Is there an example collectd.conf that has notifications or thresholds (whatever vitrage needs) setup for something basic like CPU ?

Greg.




From: "Afek, Ifat (Nokia - IL/Kfar Sava)" <ifat.afek at nokia.com>
Reply-To: "openstack-dev at lists.openstack.org" <openstack-dev at lists.openstack.org>
Date: Monday, August 28, 2017 at 9:42 AM
To: "openstack-dev at lists.openstack.org" <openstack-dev at lists.openstack.org>
Subject: Re: [openstack-dev] [vitrage] Collectd - to - Vitrage setup issues

Hi Greg,

I’m less familiar with the collectd configuration and the events that it sends.

Regarding the collectd_conf.yaml, it is definitely missing. You should add a /etc/vitrage/collectd_conf.yaml file that looks like that:

collectd:
- collectd_host: <collectd resource name>
   type: <openstack resource type, e.g. nova.host>
   name: <openstack resource name, e.g. host1>
- collectd_host: …


This file maps a Collectd resource to the corresponding resource in OpenStack. Only resources that are listed in this file will have their alarms imported to Vitrage.

Next, you should add a reference to this file in /etc/vitrage/vitrage.conf:

[collectd]
config_file = /etc/vitrage/collectd_conf.yaml

Then you should restart vitrage-graph.

Let me know if it helped,
Ifat.


From: "Waines, Greg" <Greg.Waines at windriver.com>
Date: Wednesday, 23 August 2017 at 21:19

I am trying to get collectd to report some alarms to vitrage in a devstack setup,

I am using a devstack created on a late version of ocata.
And my devstack with vitrage appears to be working ok otherwise;
e.g.  I can create VMs, and raise fake alarms using “vitrage event post -type=compute.host.down ...” or with “aodh alarm create ... resource_id=instance-uuid” ... and they get reported fine in vitrage.

UNFORTUNATELY not seeing anything in vitrage from collectd, and
                          don’t believe I’m seeing anything even from collectd, for example from the syslog output plugin.

I’ve attached the following files:   ( not sure if these get distributed on mailing list )
·         /etc/collectd/collectd.conf       <-- do these look ok ?
·         /etc/vitrage/vitrage.conf          <-- do these look ok ?
·         /var/log/syslog                 ... around the time when I updated collectd.conf and vitrage.conf and restarted collectd and vitrage-graph
o    QUESTIONS
•  NOTE THE FOLLOWING ERRORS IN THE SYSLOG FILE ... where do I get the collectd_conf.yaml file from ?  Can’t see it in the devstack files for vitrage.
·         Aug 23 14:09:31 localhost vitrage-graph[25962]: 2017-08-23 14:09:31.039 25962 ERROR vitrage.utils.file [-] File doesn't exist: /etc/vitrage/collectd_conf.yaml.
·         Aug 23 14:09:31 localhost vitrage-graph[25962]: 2017-08-23 14:09:31.040 25962 ERROR vitrage.datasources.collectd.driver [-] failed in init 'NoneType' object has no attribute '__getitem__' : TypeError: 'NoneType' object has no attribute '__getitem__'
·         Aug 23 14:09:31 localhost vitrage-graph[25962]: 2017-08-23 14:09:31.040 25962 ERROR vitrage.datasources.collectd.driver Traceback (most recent call last):
·         Aug 23 14:09:31 localhost vitrage-graph[25962]: 2017-08-23 14:09:31.040 25962 ERROR vitrage.datasources.collectd.driver   File "/opt/stack/vitrage/vitrage/datasources/collectd/driver.py", line 65, in _configuration_mapping
·         Aug 23 14:09:31 localhost vitrage-graph[25962]: 2017-08-23 14:09:31.040 25962 ERROR vitrage.datasources.collectd.driver     collectd_config_elements = collectd_config[COLLECTD_DATASOURCE]
·         Aug 23 14:09:31 localhost vitrage-graph[25962]: 2017-08-23 14:09:31.040 25962 ERROR vitrage.datasources.collectd.driver TypeError: 'NoneType' object has no attribute '__getitem__'
·         Aug 23 14:09:31 localhost vitrage-graph[25962]: 2017-08-23 14:09:31.040 25962 ERROR vitrage.datasources.collectd.driver
•
•  IT DOESN”T SEEM LIKE collectd is actually getting any events anyways ... shouldn’t I see some collectd events being reported in /var/log/syslog from some of the monitoring plugins that are loaded ?
·         gregs-air:collectd-info gregwaines$ fgrep "localhost collectd" syslog
·         Aug 23 13:56:07 localhost collectd[23267]: supervised by systemd, will signal readyness
·         Aug 23 13:56:07 localhost collectd[23267]: Initialization complete, entering read-loop.
·         Aug 23 13:56:07 localhost collectd[23267]: rrdtool plugin: Adjusting "RandomTimeout" to 0.000 seconds.
·         Aug 23 14:09:05 localhost collectd[23267]: Exiting normally.
·         Aug 23 14:09:05 localhost collectd[23267]: collectd: Stopping 5 read threads.
·         Aug 23 14:09:05 localhost collectd[23267]: rrdtool plugin: Shutting down the queue thread.
·         Aug 23 14:09:05 localhost collectd[23267]: collectd: Stopping 5 write threads.
·         Aug 23 14:09:07 localhost collectd[25824]: supervised by systemd, will signal readyness
·         Aug 23 14:09:07 localhost collectd[25824]: Initialization complete, entering read-loop.
·         Aug 23 14:09:07 localhost collectd[25824]: rrdtool plugin: Adjusting "RandomTimeout" to 0.000 seconds.
·
·         /etc/vitrage/templates/host_down_scenarios.yaml
·         /etc/vitrage/templates/host_high_cpu_load_scenarios.yaml
o    Am I suppose to have some templates that are specific to the collectd events/alarms that are being reported to vitrage ?

Any other suggestions on things to look at in order to understand what’s wrong ?

Greg.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20170906/a56b9b50/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: vitrage-graph.log
Type: application/octet-stream
Size: 34869 bytes
Desc: vitrage-graph.log
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20170906/a56b9b50/attachment.obj>


More information about the OpenStack-dev mailing list