On Thu, Jun 30, 2022, at 10:34 AM, Dmitriy Rabotyagov wrote:
I think I need to rephrase myself a bit.
Like if you have 100 patches merged and 2 rechecks, even if all of them are bare, doesn't mean that developers don't care about resources. It's more then they are so sure in their tests stability, that they absolutely sure it's infra failure.
I'm not sure I understand why certain infra failures don't deserve a note recording why the recheck was necessary if other failures do.
Or vice versa, if there are 20 rechecks for 2 patches, even if neither of them are bare, it's still weird and smth worth reconsidering from project perspective.
I think the idea is to create a culture of debugging and record keeping. Yes, I would expect after a few rechecks that maybe the root causes would be addressed in this case, but the first step in doing that is identifying the problem and making note of it.
I hope I explained better now what I meant.
чт, 30 июн. 2022 г., 16:56 Slawek Kaplonski <skaplons@redhat.com>:
Hi,
Dnia czwartek, 30 czerwca 2022 15:37:47 CEST Sean Mooney pisze:
On Thu, 2022-06-30 at 13:06 +0000, Jeremy Stanley wrote:
On 2022-06-30 14:57:44 +0200 (+0200), Dmitriy Rabotyagov wrote:
Is it possible to adjust the script a bit in the future to add the
amount of changes pushed/merged or some ratio of the amount of
rechecks per merged patch? I think it would also be an interesting
stat to see in addition to the amount of rechecks to understand how CI
is stable or not.
[...]
Recheck comment volume doesn't really provide an accurate measure of
CI stability, all it tells you is how often people requested
rerunning tests. Their reasons for doing it can be myriad, from not
believing actual failures their changes are causing, to repeatedly
rechecking successful results in hopes of reproducing some rare
failure condition.
yep we also recheck succeful result if we think we have fixed an intermint
ci failure that we could not repoduced reliably but created a patch based on code inspection.
in such a case we usually recheck 3 times looking for at least 3 consecitive check +1s before we +2w
rearly is also recheck if a patch is old and the logs have rotaed when im reviewing others work
but genrally i just click the rebase button in that case. for example i will tend to do +2 recheck
if there are already cherry picks of the patch to avoid those having to be updated. but as i said this is
rare as we dont ofthen have bugfixes that sit around for 3+ months that still actully apply with out a merge confilict
but it does happen.
so recheck is not a a great proxy for ci stablity without knowing the reason which is why not doing bare rechecks is important.
That's true. The reason why I did script to check "bare" rechecks is to see how often people just do "recheck" without even checking reason of failures.
For CI stability, some time ago I did another script https://github.com/slawqo/rechecks-stats/blob/main/rechecks_stats/rechecks.p... which checks only merged patches and counts number of "Failed build" comments from Zuul on the last, merged patch set. That is also not perfect metric for sure but can give IMO better view of the CI stability as it will not count rechecks of the passed CI run to see intermittent failures or issues caused by the patch itself.
--
Slawek Kaplonski
Principal Software Engineer
Red Hat