[openstack-dev] [neutron][grafana][infra] how to read grafana

Ihar Hrachyshka ihrachys at redhat.com
Mon Aug 8 14:45:24 UTC 2016


Matthew Treinish <mtreinish at kortar.org> wrote:

> On Mon, Aug 08, 2016 at 02:40:31PM +0200, Ihar Hrachyshka wrote:
>> Hi,
>>
>> I was looking at grafana today, and spotted another weirdness.
>>
>> See the periodic jobs dashboard:
>>
>> http://grafana.openstack.org/dashboard/db/neutron-failure-rate?panelId=4&fullscreen
>>
>> Currently it shows for me 100% failure rate for py34/oslo-master job,
>> starting from ~Aug 3. But when I go to openstack-health, I don’t see those
>> runs at all:
>>
>> http://status.openstack.org/openstack-health/#/job/periodic-neutron-py34-with-neutron-lib-master
>>
>> (^ The last run is July 31.)
>>
>> But then when I drill down into files, I can see more recent runs, like:
>>
>> http://logs.openstack.org/periodic/periodic-neutron-py34-with-neutron-lib-master/?C=M;O=A
>> http://logs.openstack.org/periodic/periodic-neutron-py34-with-neutron-lib-master/faa24e0/testr_results.html.gz
>>
>> The last link points to a run from yesterday. And as you can see it is
>> passing.
>
> That run isn't actually from yesterday, it's from July 30th. The  
> directory shows
> a recent date, but the last modified dates for the individual files is  
> older:
>
> http://logs.openstack.org/periodic/periodic-neutron-py34-with-neutron-lib-master/faa24e0/
>
> The openstack-health data goes up until the job started failing, this is  
> likely
> because the failures occur early enough in the test run that no subunit  
> output
> is generated for the run.
>
>> So, what’s wrong with the grafana dashboard? And why doesn’t
>> openstack-health show the latest runs?
>
> On the openstack-health side it looks like you're running into an issue  
> with
> using subunit2sql as the primary data source there. If you look at an  
> example
> output from what's not in openstack-health, like:
>
> http://logs.openstack.org/periodic/periodic-neutron-py34-with-neutron-lib-master/37cd5eb/console.html.gz

Nice! I guess you just picked one of those that is not present on Health  
dashboard? Or you did something more elaborate to come up with the link?

>
> You'll see that the failure is occuring before any subunit output is  
> generated.
> (during the discovery phase of testr) If there is no subunit file in the  
> log
> output for the run, then there is nothing to populate the subunit2sql DB  
> with.
> The grafana/graphite data doesn't share this limitation because it gets
> populated directly by zuul.
>
> This is a known limitation with openstack-health right, and the plan to  
> solve it
> is to add a zuul sql data store that we can query like subunit2sql for  
> job level
> information, and then use subunit2sql for more fine grained details. The  
> work on
> that currently depends on: https://review.openstack.org/#/c/223333/ which  
> adds
> the datastore to zuul. Once that lands we can work on the  
> openstack-health side
> consume that data in conjunction with subunit2sql.
>
> -Matt Treinish

Just want to say a huge thank you for the reply. It both pointed me to the  
immediate problem to solve as well as gave wider perspective on the  
mechanics that I should be aware of. It’s great to work in a community of  
individuals that so often go an extra mile for their fellow.

Ihar
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 455 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20160808/7faad7d3/attachment.pgp>


More information about the OpenStack-dev mailing list