[OpenStack-Infra] [third-party] CI Monitoring Tool

Joshua Hesketh joshua.hesketh at RACKSPACE.COM
Thu Nov 13 13:17:59 UTC 2014


Hi,

Sorry for the slow reply, I'm currently on vacation.

I think we should include the infra mailing list on this discussion so I've cc'd them here. If it's off topic we can take this off list again, however I feel like we may be duplicating efforts at the moment.

Re people not using zuul, the brainstormed idea from the infra team during the summit was to have a generic rest endpoint that can take results (and then do stats/graphs etc). Zuul would post to this endpoint as a reporter, but there would be nothing stopping others from implementing their own report posts.

Anyway there looks like there is good discussion on the etherpad.

Cheers,
Josh



________________________________
From: Steve Weston [steve.weston at triniplex.com]
Sent: Sunday, November 09, 2014 7:00 AM
To: Duncan Thomas; Joshua Hesketh
Cc: lyz at princessleia.com; mriedem at linux.vnet.ibm.com; Kurt Taylor; Anita Kuno
Subject: Re: [third-party] CI Monitoring Tool

The etherpad has been created https://etherpad.openstack.org/p/Third-Party-CI-Dashboard-InitialPlanning

I have included my input on introducing a calibration service which the CI systems would use before running a patchset.  The idea is this:  each project would define one or more jobs which the CI system would run to make sure it is working correctly, and in synchronization with Jenkins, before reporting an errant result.

I believe that this would greatly improve the stability of CI and allow problems to be fixed before the CI system runs the patch.

Thoughts, comments, and input are welcome!

Thanks,
Steve

On 11/7/14 7:58 PM, Steve Weston wrote:
I have already begun work on the code for this project, and yesterday I did write a small bit of code which implements a REST API in the Django REST framework.   Although my plan was to expose the data collected by the dashboard to other services, this framework can be modified to additionally be used to act as sort of a check-in service as Josh wrote about below.

Tomorrow I will create an etherpad so that folks may start listing out their ideas for how this dashboard will work.  I will send out a link once I have it.

Thanks,
Steve

On 11/7/14 7:53 PM, Steve Weston wrote:
+ Anita

On 11/7/14 5:34 PM, Duncan Thomas wrote:

So it is worth noting that not every third party ci is using Zuul. I think scraping gerrit (even into a db to run queries about) is a better way forward than adding something else to the ci requirements

Duncan Thomas

On Nov 7, 2014 4:41 PM, "Joshua Hesketh" <joshua.hesketh at rackspace.com<mailto:joshua.hesketh at rackspace.com>> wrote:
Hi Kurt,

Thanks for kicking this conversation off. I wonder if the -infra list would be a good place to include more.

So I believe, although we're still brainstorming etc, the vague infra plan is to have a dashboard service with API endpoints that a zuul reporter can talk to. Then all 1st + 3rd parties would report to that and therefore have a dashboard populated and statistics generated etc.

So that's kind of the long term plan that will give us some more useful data we can dive into. However, for the moment I think having a simple gerrit-bot-status dashboard (as you have described) will at least help in terms of assessing the health of the systems.

I don't think anybody in particular is working on radar so we could probably consume that repository. We should get Michael Still's okay first though (since he's the original author).

Cheers,
Josh
________________________________________
From: Kurt Taylor [kurt.r.taylor at gmail.com<mailto:kurt.r.taylor at gmail.com>]
Sent: Saturday, November 08, 2014 1:06 AM
To: lyz at princessleia.com<mailto:lyz at princessleia.com>; Joshua Hesketh; duncan.thomas at gmail.com<mailto:duncan.thomas at gmail.com>; mriedem at linux.vnet.ibm.com<mailto:mriedem at linux.vnet.ibm.com>; steve.weston at triniplex.com<mailto:steve.weston at triniplex.com>
Subject: [third-party] CI Monitoring Tool

In the third-party summit session, we discussed the need for CI
systems to have a status dashboard [1]. However, it seems that there
are multiple people writing a CI monitoring tool, let's level set:

- Josh has written a gerrit event gatherer [2]
- Duncan has too
- Steve has too (I have not yet talked to Steve)
- Radar has a command line scraper, we can remove and just use radar
gauges with one of the api backends above, fairly simple [3]
- Nova also discussed CI monitoring and status reporting [4]. Matt
owns? a requirement for Nova to implement CI monitoring (I have not
yet talked to Matt)

[1] https://etherpad.openstack.org/p/kilo-third-party-items
[2] https://github.com/stackforge/turbo-hipster/blob/master/tools/zuul_enqueue.py
[3] https://github.com/rcbau/radar/blob/master/report.py
[4] https://etherpad.openstack.org/p/nova-ci-status-checkpoint-kilo

>From conversations with Josh and Duncan, we believe that a good
initial plan is to diff a patch with what Jenkins reported, if failed
and different, collect 5? (or 3?) failures then re-queue a last known
successful patch run. If that fails, the CI system is not working
properly. I believe that covers 95% maybe higher of scenarios.

I like Josh's idea to just have a browser page refresh kick of a
sample collection and report via radar guages. Start simple, then we
could ask infra to have cron fire off gathering once every 20 minutes
or so, then maybe push this data to a database, and so on.

So, the question is, do we create a new github repo for a new tool?
reuse Radar repo? Let's get skeleton code somewhere (no preference)
and the we can get more involvement and figure out where this should
live.  We should create a spec in openstack-infra. If we agree, I'll
be happy to shepherd that.

Comments?

Kurt Taylor (krtaylor)



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-infra/attachments/20141113/8f701247/attachment.html>


More information about the OpenStack-Infra mailing list