[openstack-dev] [Metrics] Improving the data about contributor/affiliation/time

Stefano Maffulli stefano at openstack.org
Fri Oct 18 17:13:36 UTC 2013


On 10/18/2013 05:33 AM, Sean Dague wrote:
> I'm not sure it is well understoond that all members have to join the
> foundation. We don't make that a requirement on someone slinging a
> patch. 

I believe we do make it a requirement: you can't sign the CLA if you are
not also a member of the Foundation and you can't land a patch for
review if you haven't signed the CLA. All of this is enforced by gerrit.

> The thing is, the Foundation data currently seems to be the least
> accurate of all the data sets. 

As of today it's the most complete one though, and given the assumption
that 1 ATC == 1 Member of the Foundation it's also the easiest one to
fix, compared to others.

> Also, both gitdm and stackalytics have active open developer communities
> (and they are open source all the way down, don't need non open
> components to run), so again, I'm not sure why defaulting to the least
> open platform makes any sense.

I'm not talking about the visualization here. Let's focus only on the
source of data for person/affiliation/time.

Thierry: "affiliation" in the Members db is indeed to be intended in the
sanse mandated by the bylaws.

As Jesus was saying, we want to track activities also besides the git
repos and launchpad. I would like to have visibility over things done on
Ask OpenStack, translations, the upcoming groups.openstack.org and other
things we'll have in the future. That's why we're developing our own
OpenID provider.

> If the foundation member database was it's own thing, had a REST API to
> bulk fetch, and supported temporal associations, and let others propose
> updates to people's affiliation, then it would be an option.

It seems we're on the same page, and so is Jesus too. Here are my
thoughts at the moment:

  - the OpenID provider the Foundation is building will provide the
basic bulk of data with an interface (REST or whatever, including a
regular csv dump): username, all known email addresses, current
affiliation, past affiliations
  - we build a system to sanitize the bulky dump, doing things like
cleaning the names of companies, and provide ways to enrich the data for
other
  - the result of such process will be used by all reporting systems we
have, from Activity Board to gitdm to stackalytics.

How does that sound?

/stef
-- 
Ask and answer questions on https://ask.openstack.org



More information about the OpenStack-dev mailing list