On Fri, 22 Mar 2024 at 17:57, Jeremy Stanley <fungi@yuggoth.org> wrote:
On 2024-03-22 15:04:47 +0000 (+0000), Erno Kuvaja wrote:
[...]
> How about we start record somewhere, even a wiki page (oh wait, I
> think the wiki is dead, well somewhere anyways), a living
> document, about the bugs that got fixed due to non-mute recheck
> messages. I don't care if you write one-liner or descriptive
> paragraph, but a minimum of date, review link and some kind of
> explanation about how the recheck message got you to fix this bug
> and obviously the message itself. Now for those who like to keep
> the reminder in the team meetings have something to celebrate
> "Look, this actually works!" but if the page is still empty in a
> month's or two months' time, I guess we can all agree to focus our
> efforts on something more constructive and stop pestering people
> about it. Otherwise we might as well create a gerrit hook that
> recognizes the empty "recheck" and asks <insert your favourite LLM
> here> to supplement it with likely reason and get on with it.

I think this is missing the point. What we *want* is for people to
look at the failure details and try to understand *why* a job
failed. If they do that, they're likely to find that they made an
actual mistake in their change (it happens! shocking, I know, but
not every change is perfect as pushed). It's also possible they'll
find, when looking, that the cause of the failure is something they
know how to fix, and they'll push up a patch for that. If nothing
else, they might actually tell someone about the problem they ran
into, and that person may know how to address it. In the process,
they'll also gain an increased familiarity with how changes are
being tested which may assist them with reaching a better outcome in
the future.

I really might miss the point as this is what we have tried for 10 years now and it has not worked that great so far. BUT reading Slawek's and Dan's responses above they are saying that the goal is exactly not this. Although that was the goal of demanding bug number at the time somewhere in the history and as explained already that didn't work out so well either.

If we tell people, "don't recheck unless you know your change didn't
cause the problem and you can't figure out what did," then a lot of
them will respond with "oh I looked" even though they clearly
didn't. If we then say, "okay so tell me what the error was" they're
more compelled to at least actually take a look and not make
assumptions based on no evidence whatsoever. So basically we're
skipping the first thing that we actually want them to do but they
usually won't by asking them to do something else which has an
increased chance of getting them to do the thing we want.
Essentially, this is taking a pedagogical approach to the underlying
problem.

And this is the part why I proposed the living doc. As the people who do think it makes a difference already does this and the rest might as well just use that LLM hook as long as they are not convinced that what they write after the recheck has any substantial meaning to anything. I'm not saying that the goal here isn't honourable, just saying that this is probably the 5th or so around the loop of this discussion over the past ten years and so far it has not produced the result we'd like to see, so perhaps we could try something different? Unfortunately that "and you can't figure out what did" is a pretty low bar. Sorry, I see myself still doing this as well "recheck" #3 might be just bare when originally job x failed to say timeout, on first recheck it passed but y failed on something not cleaning up neutron network properly and on the second recheck both of the previous ones passed but job z timed out this time and I've been watching that same pattern past 2 weeks (not literally these past two weeks, but general observation) across multiple patches and I still have no idea what causes those failures. Honestly I thought I had used that "unrelated failure" more recently than what the statistics say in total :P

Where I do agree with you is that a good outcome is one in which
whatever behavior we attempt to incentivize leads to more bugs being
fixed, tests becoming more reliable, and overall code quality
improving. What your solution misses is that most of the bugs which
are likely to get fixed because of this approach will ideally be
fixed by the person who otherwise would have left a recheck comment,
and so won't result in any recheck comment at all (and if all goes
well, will lead to fewer recheck comments across all our changes).

Perhaps the shout out in Nova's weekly meeting about fixing some gate bug would be enough for some to take the extra time to figure out a common failure on job X. ;)
Maybe writing a community goal out of it could do the trick?
--
Jeremy Stanley