[stackalytics] Reported numbers seem inaccurate
Hi, I've heard a couple of people say they don't feel the numbers reported by Stackalytics accurately reflect reality. I've been trying to gather a few stats for my Kolla update session in Denver, and am finding the same. I'll try to give some concrete examples. Reviews for all kolla deliverables in Stein [1]. Here the company stats don't reflect the individual stats. Also, the total reviews in the 'Kolla Official' module does not equal the sum of the reviews of its submodules (kolla, kolla-ansible, kolla-cli). If I look at the contribution summary for Kolla Official in the last 90 days [2], they are actually greater than those for the last 180 days [3]! There are also similar issues with commit metrics, and none seem to match what I see in git. Thanks, Mark [1] https://www.stackalytics.com/?metric=marks&module=kolla-group [2] https://www.stackalytics.com/report/contribution/kolla-group/90 [3] https://www.stackalytics.com/report/contribution/kolla-group/180
I have also seen issues with the new Stackalytics data. At times it showed I have contributed less lines for Stein than were in a single patchset. Michael On Thu, Apr 11, 2019 at 11:21 AM Mark Goddard <mark@stackhpc.com> wrote:
Hi,
I've heard a couple of people say they don't feel the numbers reported by Stackalytics accurately reflect reality. I've been trying to gather a few stats for my Kolla update session in Denver, and am finding the same. I'll try to give some concrete examples.
Reviews for all kolla deliverables in Stein [1]. Here the company stats don't reflect the individual stats. Also, the total reviews in the 'Kolla Official' module does not equal the sum of the reviews of its submodules (kolla, kolla-ansible, kolla-cli).
If I look at the contribution summary for Kolla Official in the last 90 days [2], they are actually greater than those for the last 180 days [3]!
There are also similar issues with commit metrics, and none seem to match what I see in git.
Thanks, Mark
[1] https://www.stackalytics.com/?metric=marks&module=kolla-group [2] https://www.stackalytics.com/report/contribution/kolla-group/90 [3] https://www.stackalytics.com/report/contribution/kolla-group/180
In working on the project update for Ironic, I too have noticed wild inaccuracies. For one contributor, I found it reported 1 commit for a specific contributor in one screen, but when looking at the data differently that same user showed four different commits. Looking at the commit count reporting for ironic-group, it also only shows about 200 commits for stein, but when I manually dump the data out from the `git log` output I count in excess of 600 commits. It also reported like 63 contributors for stein, but I get 123 based on just the commit log author/co-authored-by data. I totally get that there are lots of edge cases to cover in any UI to show insight, but the data just does not seem like it is accurately reporting. Is it time for us, as a community, to whip up something that helps provide basic insight? -Julia On Thu, Apr 11, 2019 at 11:29 AM Mark Goddard <mark@stackhpc.com> wrote:
Hi,
I've heard a couple of people say they don't feel the numbers reported by Stackalytics accurately reflect reality. I've been trying to gather a few stats for my Kolla update session in Denver, and am finding the same. I'll try to give some concrete examples.
Reviews for all kolla deliverables in Stein [1]. Here the company stats don't reflect the individual stats. Also, the total reviews in the 'Kolla Official' module does not equal the sum of the reviews of its submodules (kolla, kolla-ansible, kolla-cli).
If I look at the contribution summary for Kolla Official in the last 90 days [2], they are actually greater than those for the last 180 days [3]!
There are also similar issues with commit metrics, and none seem to match what I see in git.
Thanks, Mark
[1] https://www.stackalytics.com/?metric=marks&module=kolla-group [2] https://www.stackalytics.com/report/contribution/kolla-group/90 [3] https://www.stackalytics.com/report/contribution/kolla-group/180
On Mon, Apr 15, 2019, at 17:20, Julia Kreger wrote:
In working on the project update for Ironic, I too have noticed wild inaccuracies.
For one contributor, I found it reported 1 commit for a specific contributor in one screen, but when looking at the data differently that same user showed four different commits.
Looking at the commit count reporting for ironic-group, it also only shows about 200 commits for stein, but when I manually dump the data out from the `git log` output I count in excess of 600 commits.
It also reported like 63 contributors for stein, but I get 123 based on just the commit log author/co-authored-by data. I totally get that there are lots of edge cases to cover in any UI to show insight, but the data just does not seem like it is accurately reporting.
Is it time for us, as a community, to whip up something that helps provide basic insight?
I too found that stackalytics was wildly unuseful for preparing data for project updates. I started using reviewstats[1] (after learning about it from another thread) which provided a much better starting point. So I don't think we need to whip up something new, we should make that thing better. [1] https://opendev.org/openstack-infra/reviewstats Colleen
-Julia
On Thu, Apr 11, 2019 at 11:29 AM Mark Goddard <mark@stackhpc.com> wrote:
Hi,
I've heard a couple of people say they don't feel the numbers reported by Stackalytics accurately reflect reality. I've been trying to gather a few stats for my Kolla update session in Denver, and am finding the same. I'll try to give some concrete examples.
Reviews for all kolla deliverables in Stein [1]. Here the company stats don't reflect the individual stats. Also, the total reviews in the 'Kolla Official' module does not equal the sum of the reviews of its submodules (kolla, kolla-ansible, kolla-cli).
If I look at the contribution summary for Kolla Official in the last 90 days [2], they are actually greater than those for the last 180 days [3]!
There are also similar issues with commit metrics, and none seem to match what I see in git.
Thanks, Mark
[1] https://www.stackalytics.com/?metric=marks&module=kolla-group [2] https://www.stackalytics.com/report/contribution/kolla-group/90 [3] https://www.stackalytics.com/report/contribution/kolla-group/180
On Mon, Apr 15, 2019, at 5:20 PM, Julia Kreger wrote:
In working on the project update for Ironic, I too have noticed wild inaccuracies.
For one contributor, I found it reported 1 commit for a specific contributor in one screen, but when looking at the data differently that same user showed four different commits.
Looking at the commit count reporting for ironic-group, it also only shows about 200 commits for stein, but when I manually dump the data out from the `git log` output I count in excess of 600 commits.
To add to the weirdness: if you start at the default stackalytics page, then select "Ironic Official" as the module, then switch from reviews to commits it seems to reload for me and show 964 commits instead of ~200. Perhaps there are multiple backends and one or more of them have old/inaccurate data?
It also reported like 63 contributors for stein, but I get 123 based on just the commit log author/co-authored-by data. I totally get that there are lots of edge cases to cover in any UI to show insight, but the data just does not seem like it is accurately reporting.
Is it time for us, as a community, to whip up something that helps provide basic insight?
-Julia
On Mon, 2019-04-15 at 23:09 -0400, Clark Boylan wrote:
On Mon, Apr 15, 2019, at 5:20 PM, Julia Kreger wrote:
In working on the project update for Ironic, I too have noticed wild inaccuracies.
For one contributor, I found it reported 1 commit for a specific contributor in one screen, but when looking at the data differently that same user showed four different commits.
Looking at the commit count reporting for ironic-group, it also only shows about 200 commits for stein, but when I manually dump the data out from the `git log` output I count in excess of 600 commits.
To add to the weirdness: if you start at the default stackalytics page, then select "Ironic Official" as the module, then switch from reviews to commits it seems to reload for me and show 964 commits instead of ~200. Perhaps there are multiple backends and one or more of them have old/inaccurate data? ya i have noticed the same thing since the new ui was intoducd. sometimes of you referesh or start again form the manin page then you will get different data
It also reported like 63 contributors for stein, but I get 123 based on just the commit log author/co-authored-by data. I totally get that there are lots of edge cases to cover in any UI to show insight, but the data just does not seem like it is accurately reporting.
Is it time for us, as a community, to whip up something that helps provide basic insight?
-Julia
Julia Kreger wrote:
[...] Is it time for us, as a community, to whip up something that helps provide basic insight?
I think we should first agree on what we actually need. A first level would be to extract the data about changes and reviews from Gerrit into a query-able system so that we don't have everyone hammering Gerrit with individual stats queries. Then people can share their "insights scripts" and run them on the same official data. A second level would be to come up with some common definition of "basic insight" and produce that data for all teams. Personally I think the first level would already give us a lot more confidence and consistency in the numbers we produce. As an aside, the OSF has been driving a proof-of-concept experiment to use Bitergia tooling (now ELK-based) for Kata Containers and StarlingX, which we could extend to OpenStack and all other OSF projects if successful. Historically we dropped the old Bitergia tooling because it was falling short with OpenStack complexity (groups of repositories per project team) and release-timeframe data, and its visualization capabilities were limited. But the new version is much better-looking and flexible, so it might be a solution in the long term. -- Thierry Carrez (ttx)
On Tue, Apr 16, 2019 at 1:33 AM Thierry Carrez <thierry@openstack.org> wrote:
Julia Kreger wrote:
[...] Is it time for us, as a community, to whip up something that helps provide basic insight?
I think we should first agree on what we actually need.
I think this is vital. Things like reviewstats seems to be useful for review activity but I think there is also an aspect of activity that comes between releases or between stable branches for project teams. Statements like "We had x number of contributors during y release contribute code into z project" and "we observed x percentage change in activity over the past cycle? The cycle before was z percentage?" Helps us determine where we are presently so we can chart our future course. Lots of graphs are pretty, but I think we can all turn numbers based reporting into pretty aggregate graphs if something is collecting the dimensions of raw data needed to count lines or to add numbers in columns.
A first level would be to extract the data about changes and reviews from Gerrit into a query-able system so that we don't have everyone hammering Gerrit with individual stats queries. Then people can share their "insights scripts" and run them on the same official data.
In my mind, the extracted data could just be data in text files that could be used with some simple scripting to create useful reporting. The moderate pain point is collecting all of that data and the point where things start breaking is repositories that are part of projects that do are considered released as needed utilities that are not branched and the branch points can't be used to compare velocity.
A second level would be to come up with some common definition of "basic insight" and produce that data for all teams. Personally I think the first level would already give us a lot more confidence and consistency in the numbers we produce.
++
As an aside, the OSF has been driving a proof-of-concept experiment to use Bitergia tooling (now ELK-based) for Kata Containers and StarlingX, which we could extend to OpenStack and all other OSF projects if successful.
Historically we dropped the old Bitergia tooling because it was falling short with OpenStack complexity (groups of repositories per project team) and release-timeframe data, and its visualization capabilities were limited. But the new version is much better-looking and flexible, so it might be a solution in the long term.
-- Thierry Carrez (ttx)
We are using it to see company-wide contributions per release, per some specific time period work company wide. I usually look for PR, reviews, and time spend. This is for individuals and companies. -----Original Message----- From: Julia Kreger <juliaashleykreger@gmail.com> Sent: Tuesday, April 16, 2019 9:13 AM To: Thierry Carrez Cc: openstack-discuss Subject: Re: [stackalytics] Reported numbers seem inaccurate [EXTERNAL EMAIL] On Tue, Apr 16, 2019 at 1:33 AM Thierry Carrez <thierry@openstack.org> wrote:
Julia Kreger wrote:
[...] Is it time for us, as a community, to whip up something that helps provide basic insight?
I think we should first agree on what we actually need.
I think this is vital. Things like reviewstats seems to be useful for review activity but I think there is also an aspect of activity that comes between releases or between stable branches for project teams. Statements like "We had x number of contributors during y release contribute code into z project" and "we observed x percentage change in activity over the past cycle? The cycle before was z percentage?" Helps us determine where we are presently so we can chart our future course. Lots of graphs are pretty, but I think we can all turn numbers based reporting into pretty aggregate graphs if something is collecting the dimensions of raw data needed to count lines or to add numbers in columns.
A first level would be to extract the data about changes and reviews from Gerrit into a query-able system so that we don't have everyone hammering Gerrit with individual stats queries. Then people can share their "insights scripts" and run them on the same official data.
In my mind, the extracted data could just be data in text files that could be used with some simple scripting to create useful reporting. The moderate pain point is collecting all of that data and the point where things start breaking is repositories that are part of projects that do are considered released as needed utilities that are not branched and the branch points can't be used to compare velocity.
A second level would be to come up with some common definition of "basic insight" and produce that data for all teams. Personally I think the first level would already give us a lot more confidence and consistency in the numbers we produce.
++
As an aside, the OSF has been driving a proof-of-concept experiment to use Bitergia tooling (now ELK-based) for Kata Containers and StarlingX, which we could extend to OpenStack and all other OSF projects if successful.
Historically we dropped the old Bitergia tooling because it was falling short with OpenStack complexity (groups of repositories per project team) and release-timeframe data, and its visualization capabilities were limited. But the new version is much better-looking and flexible, so it might be a solution in the long term.
-- Thierry Carrez (ttx)
On 4/11/2019 1:21 PM, Mark Goddard wrote:
Hi,
I've heard a couple of people say they don't feel the numbers reported by Stackalytics accurately reflect reality. I've been trying to gather a few stats for my Kolla update session in Denver, and am finding the same. I'll try to give some concrete examples.
Reviews for all kolla deliverables in Stein [1]. Here the company stats don't reflect the individual stats. Also, the total reviews in the 'Kolla Official' module does not equal the sum of the reviews of its submodules (kolla, kolla-ansible, kolla-cli).
If I look at the contribution summary for Kolla Official in the last 90 days [2], they are actually greater than those for the last 180 days [3]!
There are also similar issues with commit metrics, and none seem to match what I see in git.
Thanks, Mark
[1] https://www.stackalytics.com/?metric=marks&module=kolla-group [2] https://www.stackalytics.com/report/contribution/kolla-group/90 [3] https://www.stackalytics.com/report/contribution/kolla-group/180
Seems things are busted again, or no one is doing any nova reviews: https://www.stackalytics.com/report/contribution/nova/30 -- Thanks, Matt
Hi Matt, Thank you for message. I checked logs of stackalytics. It was a database critical issue several hours ago. When I opened your link with nova reviews I saw review stats, so stackalytics collecting data at this moment. Collecting of all stats is a long operation so data may be incomplete (you can see banner 'The data is being loaded now and is not complete' at the top of stackalytics.com). It took 30-40 hours to collect all data to empty database, so the only thing we can do is to wait. Sergey On Wed, Jun 26, 2019 at 6:01 PM Matt Riedemann <mriedemos@gmail.com> wrote:
On 4/11/2019 1:21 PM, Mark Goddard wrote:
Hi,
I've heard a couple of people say they don't feel the numbers reported by Stackalytics accurately reflect reality. I've been trying to gather a few stats for my Kolla update session in Denver, and am finding the same. I'll try to give some concrete examples.
Reviews for all kolla deliverables in Stein [1]. Here the company stats don't reflect the individual stats. Also, the total reviews in the 'Kolla Official' module does not equal the sum of the reviews of its submodules (kolla, kolla-ansible, kolla-cli).
If I look at the contribution summary for Kolla Official in the last 90 days [2], they are actually greater than those for the last 180 days [3]!
There are also similar issues with commit metrics, and none seem to match what I see in git.
Thanks, Mark
[1] https://www.stackalytics.com/?metric=marks&module=kolla-group [2] https://www.stackalytics.com/report/contribution/kolla-group/90 [3] https://www.stackalytics.com/report/contribution/kolla-group/180
Seems things are busted again, or no one is doing any nova reviews:
https://www.stackalytics.com/report/contribution/nova/30
--
Thanks,
Matt
-- Best Regards, Sergey Nikitin
On 6/26/2019 9:41 AM, Sergey Nikitin wrote:
Hi Matt, Thank you for message. I checked logs of stackalytics. It was a database critical issue several hours ago.
When I opened your link with nova reviews I saw review stats, so stackalytics collecting data at this moment.
Collecting of all stats is a long operation so data may be incomplete (you can see banner 'The data is being loaded now and is not complete' at the top of stackalytics.co <http://stackalytics.co>m).
It took 30-40 hours to collect all data to empty database, so the only thing we can do is to wait.
Sergey
Ack, thanks for the quick reply Sergey. -- Thanks, Matt
participants (10)
- 
                
                Arkady.Kanevsky@dell.com
- 
                
                Clark Boylan
- 
                
                Colleen Murphy
- 
                
                Julia Kreger
- 
                
                Mark Goddard
- 
                
                Matt Riedemann
- 
                
                Michael Johnson
- 
                
                Sean Mooney
- 
                
                Sergey Nikitin
- 
                
                Thierry Carrez