On Fri, 22 Mar 2024 at 17:57, Jeremy Stanley <fungi@yuggoth.org> wrote:
On 2024-03-22 15:04:47 +0000 (+0000), Erno Kuvaja wrote: [...]
How about we start record somewhere, even a wiki page (oh wait, I think the wiki is dead, well somewhere anyways), a living document, about the bugs that got fixed due to non-mute recheck messages. I don't care if you write one-liner or descriptive paragraph, but a minimum of date, review link and some kind of explanation about how the recheck message got you to fix this bug and obviously the message itself. Now for those who like to keep the reminder in the team meetings have something to celebrate "Look, this actually works!" but if the page is still empty in a month's or two months' time, I guess we can all agree to focus our efforts on something more constructive and stop pestering people about it. Otherwise we might as well create a gerrit hook that recognizes the empty "recheck" and asks <insert your favourite LLM here> to supplement it with likely reason and get on with it.
I think this is missing the point. What we *want* is for people to look at the failure details and try to understand *why* a job failed. If they do that, they're likely to find that they made an actual mistake in their change (it happens! shocking, I know, but not every change is perfect as pushed). It's also possible they'll find, when looking, that the cause of the failure is something they know how to fix, and they'll push up a patch for that. If nothing else, they might actually tell someone about the problem they ran into, and that person may know how to address it. In the process, they'll also gain an increased familiarity with how changes are being tested which may assist them with reaching a better outcome in the future.
I really might miss the point as this is what we have tried for 10 years now and it has not worked that great so far. BUT reading Slawek's and Dan's responses above they are saying that the goal is exactly not this. Although that was the goal of demanding bug number at the time somewhere in the history and as explained already that didn't work out so well either.
If we tell people, "don't recheck unless you know your change didn't cause the problem and you can't figure out what did," then a lot of them will respond with "oh I looked" even though they clearly didn't. If we then say, "okay so tell me what the error was" they're more compelled to at least actually take a look and not make assumptions based on no evidence whatsoever. So basically we're skipping the first thing that we actually want them to do but they usually won't by asking them to do something else which has an increased chance of getting them to do the thing we want. Essentially, this is taking a pedagogical approach to the underlying problem.
And this is the part why I proposed the living doc. As the people who do think it makes a difference already does this and the rest might as well just use that LLM hook as long as they are not convinced that what they write after the recheck has any substantial meaning to anything. I'm not saying that the goal here isn't honourable, just saying that this is probably the 5th or so around the loop of this discussion over the past ten years and so far it has not produced the result we'd like to see, so perhaps we could try something different? Unfortunately that "and you can't figure out what did" is a pretty low bar. Sorry, I see myself still doing this as well "recheck" #3 might be just bare when originally job x failed to say timeout, on first recheck it passed but y failed on something not cleaning up neutron network properly and on the second recheck both of the previous ones passed but job z timed out this time and I've been watching that same pattern past 2 weeks (not literally these past two weeks, but general observation) across multiple patches and I still have no idea what causes those failures. Honestly I thought I had used that "unrelated failure" more recently than what the statistics say in total :P
Where I do agree with you is that a good outcome is one in which whatever behavior we attempt to incentivize leads to more bugs being fixed, tests becoming more reliable, and overall code quality improving. What your solution misses is that most of the bugs which are likely to get fixed because of this approach will ideally be fixed by the person who otherwise would have left a recheck comment, and so won't result in any recheck comment at all (and if all goes well, will lead to fewer recheck comments across all our changes).
Perhaps the shout out in Nova's weekly meeting about fixing some gate bug would be enough for some to take the extra time to figure out a common failure on job X. ;) Maybe writing a community goal out of it could do the trick?
-- Jeremy Stanley