[openstack-dev] [neutron][grafana][infra] how to read grafana

Matthew Treinish mtreinish at kortar.org
Mon Aug 8 14:24:29 UTC 2016


On Mon, Aug 08, 2016 at 02:40:31PM +0200, Ihar Hrachyshka wrote:
> Hi,
> 
> I was looking at grafana today, and spotted another weirdness.
> 
> See the periodic jobs dashboard:
> 
> http://grafana.openstack.org/dashboard/db/neutron-failure-rate?panelId=4&fullscreen
> 
> Currently it shows for me 100% failure rate for py34/oslo-master job,
> starting from ~Aug 3. But when I go to openstack-health, I don’t see those
> runs at all:
> 
> http://status.openstack.org/openstack-health/#/job/periodic-neutron-py34-with-neutron-lib-master
> 
> (^ The last run is July 31.)
> 
> But then when I drill down into files, I can see more recent runs, like:
> 
> http://logs.openstack.org/periodic/periodic-neutron-py34-with-neutron-lib-master/?C=M;O=A
> http://logs.openstack.org/periodic/periodic-neutron-py34-with-neutron-lib-master/faa24e0/testr_results.html.gz
> 
> The last link points to a run from yesterday. And as you can see it is
> passing.

That run isn't actually from yesterday, it's from July 30th. The directory shows
a recent date, but the last modified dates for the individual files is older:

http://logs.openstack.org/periodic/periodic-neutron-py34-with-neutron-lib-master/faa24e0/

The openstack-health data goes up until the job started failing, this is likely
because the failures occur early enough in the test run that no subunit output
is generated for the run.

> 
> So, what’s wrong with the grafana dashboard? And why doesn’t
> openstack-health show the latest runs?
> 

On the openstack-health side it looks like you're running into an issue with
using subunit2sql as the primary data source there. If you look at an example
output from what's not in openstack-health, like:

http://logs.openstack.org/periodic/periodic-neutron-py34-with-neutron-lib-master/37cd5eb/console.html.gz

You'll see that the failure is occuring before any subunit output is generated.
(during the discovery phase of testr) If there is no subunit file in the log
output for the run, then there is nothing to populate the subunit2sql DB with.
The grafana/graphite data doesn't share this limitation because it gets
populated directly by zuul.

This is a known limitation with openstack-health right, and the plan to solve it
is to add a zuul sql data store that we can query like subunit2sql for job level
information, and then use subunit2sql for more fine grained details. The work on
that currently depends on: https://review.openstack.org/#/c/223333/ which adds
the datastore to zuul. Once that lands we can work on the openstack-health side
consume that data in conjunction with subunit2sql.

-Matt Treinish
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: not available
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20160808/420b83c2/attachment.pgp>


More information about the OpenStack-dev mailing list