Is it possible to adjust the script a bit in the future to add the amount of changes pushed/merged or some ratio of the amount of rechecks per merged patch? I think it would also be an interesting stat to see in addition to the amount of rechecks to understand how CI is stable or not. чт, 30 июн. 2022 г. в 11:12, Slawek Kaplonski <skaplons@redhat.com>:
Hi,
During the last PTG and after it, in the TC we were discussing about CI resources usage and about "rechecks" of the CI jobs (I know, it's again the same topic).
One of the things we would like to limit, or even avoid is to do "no reason rechecks" which means writing quick comment "recheck" without checking what really was wrong in the previous run.
We know that putting some hard rules that only comments with "recheck" with given reason will trigger new CI jobs run will not work fine as people may simply start writing any random things there. But we want to encourage all teams to at least to investigate failures and do as many rechecks with explanation as possible.
For now I prepared simple script [1] which counts how much of all rechecks are "bare rechecks". It can be checked by project (like openstack/neutron) or give summary for all projects or teams (like Quality Assurance for example). I prepared some stats for all teams listed in the https://opendev.org/openstack/governance/src/branch/master/reference/project... from last 30 days:
+-------------------+---------------+--------------+-------------------+
| Team | Bare rechecks | All Rechecks | Bare rechecks [%] |
+-------------------+---------------+--------------+-------------------+
| skyline | 20 | 20 | 100.0 |
| magnum | 2 | 2 | 100.0 |
| zun | 1 | 1 | 100.0 |
| mistral | 9 | 9 | 100.0 |
| ec2-api | 1 | 1 | 100.0 |
| barbican | 15 | 15 | 100.0 |
| venus | 2 | 2 | 100.0 |
| solum | 1 | 1 | 100.0 |
| tacker | 30 | 30 | 100.0 |
| trove | 4 | 4 | 100.0 |
| rally | 2 | 2 | 100.0 |
| storlets | 5 | 5 | 100.0 |
| winstackers | 3 | 3 | 100.0 |
| OpenStack Charms | 32 | 33 | 96.97 |
| sahara | 27 | 28 | 96.43 |
| keystone | 24 | 25 | 96.0 |
| kuryr | 120 | 126 | 95.24 |
| kolla | 134 | 142 | 94.37 |
| Puppet OpenStack | 94 | 103 | 91.26 |
| cloudkitty | 10 | 11 | 90.91 |
| OpenStack-Helm | 29 | 32 | 90.62 |
| blazar | 8 | 9 | 88.89 |
| tripleo | 563 | 646 | 87.15 |
| requirements | 20 | 23 | 86.96 |
| Telemetry | 30 | 35 | 85.71 |
| horizon | 55 | 67 | 82.09 |
| ironic | 131 | 164 | 79.88 |
| oslo | 11 | 14 | 78.57 |
| heat | 25 | 33 | 75.76 |
| cinder | 221 | 294 | 75.17 |
| cyborg | 6 | 8 | 75.0 |
| murano | 3 | 4 | 75.0 |
| glance | 20 | 27 | 74.07 |
| OpenStackSDK | 47 | 64 | 73.44 |
| manila | 108 | 160 | 67.5 |
| neutron | 149 | 221 | 67.42 |
| senlin | 2 | 3 | 66.67 |
| swift | 16 | 25 | 64.0 |
| Quality Assurance | 106 | 167 | 63.47 |
| nova | 41 | 71 | 57.75 |
| octavia | 32 | 60 | 53.33 |
| designate | 19 | 39 | 48.72 |
| OpenStackAnsible | 41 | 226 | 18.14 |
+-------------------+---------------+--------------+-------------------+
As You can see from that list above, there is much to improve there.
I hope that if teams will be checking more reasons of the CI failures, and reporting bugs found there, we may make our CI more stable and as a result have less rechecks which will save our infra resources :)
[1] https://github.com/slawqo/rechecks-stats/blob/main/rechecks_stats/bare_reche...
--
Slawek Kaplonski
Principal Software Engineer
Red Hat