[openstack-dev] [Metrics] Improving the data about contributor/affiliation/time

Jesus M. Gonzalez-Barahona jgb at bitergia.com
Fri Oct 18 16:13:04 UTC 2013


On Fri, 2013-10-18 at 08:33 -0400, Sean Dague wrote:
> On 10/17/2013 05:34 PM, Stefano Maffulli wrote:
> > [...]
> > Four sources of data for this reporting is bad and not sustainable.
> >
> > Since it seems commonly accepted that all developers need to be members
> > of the Foundation, and that Foundation members need to state their
> > affiliation when they join and keep such data current when it changes, I
> > think the Foundation is in a good place to provide the authoritative
> > data for all projects to use.
> 
> I'm not sure it is well understoond that all members have to join the 
> foundation. We don't make that a requirement on someone slinging a 
> patch. It would be nice to know what percentage of ATCs actually are 
> foundation members at the moment (presumably that number is easy to 
> generate?)

My impression is that we need a data source that covers all contributors
as much as possible. As you said, even for developers it is not always
the case. If you are also interested in tracking bug reporters or
message posters, for example, that is even less the case. Linking
affiliation information to Foundation membership could be risky from
this point of view.

A different issue is that the Foundation maintains a system for claiming
or fixing affiliation information, so that all of us producing metrics
can use it. It could be based on the current datasets (the best of them,
or maybe a combination of some of them), and could provide some
interface for easy and public proposal of changes. It should also
provide some interface so that any metrics collecting system can use it.

For being useful, it should also include data for identification of
developers (usually, the email addresses they are using in the different
OpenStack repositories), since developers not only change organization,
they also tend to change identification from time to time.

> The thing is, the Foundation data currently seems to be the least 
> accurate of all the data sets. Also, the lack of affiliation over time 
> is really a problem for this project, especially if one of the driving 
> factors for so much interest in statistics comes from organizations 
> wanting to ensure contributions by their employees get counted. A 
> significant percentage of top contributors to OpenStack have not 
> remained at a single employer over their duration to contributing to 
> OpenStack, and I expect that to be the norm as the project ages.
> 
> Also, both gitdm and stackalytics have active open developer communities 
> (and they are open source all the way down, don't need non open 
> components to run), so again, I'm not sure why defaulting to the least 
> open platform makes any sense.

Just for the record, the MetricsGrimoire / vizGrimoire stack that is
producing the dashboards at http://activity.openstack.org/dash/ is also
complete open source, with an open developer community, see
http://metricsgrimoire.github.io and http://vizgrimoire.github.io

All the data is also available, in the form of JSON files and SQL
databases, see
http://activity.openstack.org/dash/newbrowser/browser/data/db/
(which includes affilation data)

This said, I'm not intending that our affiliation datasets are the best
ones. We'd be more than happy to collaborate with the rest to produce a
common dataset, or to revert to some other if it proves better
maintained. In fact, we have already incorporated affiliation data from
gitdm and (partially) from stackalytics.

> Member affiliation in the Foundation database can also only be fixed by 
> the individual. In the other tools people in the know can fix it. It 
> means we get a wikipedia effect in getting the data more accurate, as 
> you can fix any issue you see, not just your own.

This is something very important, from my point of view. The ability of
changing any data you may find inaccurate, along with the use of a
review system, just to ensure that we don't include malicious requests
for change, would be desired features for any system we use.

> If the foundation member database was it's own thing, had a REST API to 
> bulk fetch, and supported temporal associations, and let others propose 
> updates to people's affiliation, then it would be an option. But right 
> now it seems very far from being useful, and is probably the least, not 
> most, accurate version of the world.
[...]

>From my point of view, having a REST API would be helpful, but not a
must. The usual way to include bulk data for us is to retrieve the
external bulk data, compare it with the current we have, and decide (in
part by hand) on the differences one by one, trying to incorporate the
most reliable option. If the external data were always more reliable, it
would be a matter of just comparing and using the external data when a
match is found, and could be done automatically. And no REST data is
really needed for this.

Support of temporal associations, proposal of updates by anyone, review
system, and support for multiple identites, would be very convenient.

Saludos,

	Jesus.

-- 
-- 
Bitergia: http://bitergia.com http://blog.bitergia.com




More information about the OpenStack-dev mailing list