On Fri, 22 Mar 2024 at 14:37, Sylvain Bauza <sbauza@redhat.com> wrote:


Le ven. 22 mars 2024 à 14:28, <smooney@redhat.com> a écrit :
On Fri, 2024-03-22 at 13:18 +0100, Sławek Kapłoński wrote:
> Hi,
>
> Dnia czwartek, 21 marca 2024 20:19:18 CET smooney@redhat.com pisze:
> >
> > im not aware of all th things on the TCs plate right now but this feels to me
> > liek somethign we should not be spending time on.
> > the contributor that do most of the work day to day already know this policy.
> > for new continuator we try to tell them about it when we see bare rechecks
> > but this always feels like we are perching to the choir.
> >
> > i think it would be better to just stop tracking this
> > and i don't think enforcing this in code is a good thing either.
> > people that don't care will just work around it so unless we are going
> > to soft ban an account for a few days or something like that i dont think it will
> > have much impact.
> >
> > i dont think we need to take the ban hammer out to people that do bare rechecks but
> > after a few years of advertising this policy now this feels more like sapm to me
> > then actully something that will have a positive effect.
>
> I just want to explain one thing here. It was never my intention to ban anyone or to enforce anything. My email is
> only to ask people to maybe try to improve those recheck comments a bit. But it's totally fine if people in some
> project will not do that at all. They still can do bare rechecks if they want to.

well just looping back to dan's reply to my previosus email.

it seams like the intent this time is to understand why recheck are being done
not to stop peopel rechecking

the irritation i was expressing was because previous time we asked peopel
not to recheck with out a reason was not implemented as asking people to trying
and see if they can fix the underlying problem.


repeating
  * STOP DOING BLIND RECHECKS aka. 'recheck'
    https://docs.openstack.org/project-team-guide/testing.html#how-to-handle-test-failures

in the gate status section of our meeting every week and not actually trying to understand the reasons
people were giving for a recheck was unhelpful and franckly quite annoying.
For reasons for the last 18 month my upstream time has been severly reduced and has almost exclusively
been code review and gate fixes. my persecption of the previous incarndation fo this effort in 2022 was the focus
was not on actully fixing the problems but trackign the rechecks and reducign them to save ci resouce ranther
then improvig ci stablity.


it sound like the intent of the new effort is to actully fix the issues by understanding
why peopel are rechecking and discuraging blind rejects is just a side effect to ensure we
have good data to take action on.

one of the things that was indetiged in the list was kernel panics in the guest
one way to reduce that is if you have a projec ton this list
https://codesearch.opendev.org/?q=cirros-0.5.2&i=nope&literal=nope&files=&excludeFiles=&repos=
please move to cirros 6 or at least 0.5.3

cirros 0.5.2 has a know kernel bug that casues random guest panics that is fixed in both 0.5.3 and the 6.x series

i aware of the ironly of a nova core saying that when nova ahse one freence to a 0.5.2 image but that one is for arm
and does not (that we have seen) hit the issue thats in the x86 one but its on my todo list to fix.

im all for using recheck reasons as a data soruce to try and identify and fix ci issue

but when i saw in our team meeting this week it very triggering as we prviously had agreed to stop doing that as a team

 * please avoid bare rechecks   (bauzas, 16:12:14)
 * ACTION: bauzas to tell about the bare rechecks every week in our
    meeting  (bauzas, 16:15:26)

ill chat to bauzas about how we can phase this reminder to make it cleare the focus is not about stoping blind recheck
its about understanding why a recheck was required and fixing that issue and providing a reasons is the minimal first
step in that process.

I don't have the exact figures and everything being logged, I'm pretty sure we could eventually find out when and how, but I do remember that before I started to provide this reminder, I explained the reasons behind.
FWIW, the whole nova meeting is a collection of reminders (collecting items for the PTG, triaging bugs, incenting review priorities) that I thought that this other reminder wasn't controversial, and I never heard anyone complaining about it.
Then I stopped providing this reminder for the exact reason that it worked : our bare recheck numbers were dropping.

Now, the TC is pivoting a bit and asking the project leaders to ask the contributors to give better strings for their rechecks. I don't really see it controversial either and I'm open to discuss it on the right media, which is the nova meeting.

-Sylvain

>
> >
> > maybe im just being cynical but if we make recheck hard people will just
> > work around it by maing a trivial change to the patch or
> > hitting the rebase button in the ui instead and get the same effect...
> >
> > that my perspective anyway but i dont think this help our comuntity be more
> > welcoming or enjoyable to work with. ci is a share resouce we should not
> > squander but this topic just draing my energy when i see it come up in team
> > meeting or the mailing list.
> >
> > On Thu, 2024-03-21 at 20:39 +0300, Maksim Malchuk wrote:
> > > Sven, bare rechecks can't be disabled, because it's hard to check if the
> > > meaningful reason is provided.
> > > Enforcing specify the reason will lead to the commands like "recheck
> > > failed" or "recheck lets check" etc.
> > >
> > >
> > > On Thu, Mar 21, 2024 at 7:10 PM Sławek Kapłoński <skaplons@redhat.com>
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > Dnia czwartek, 21 marca 2024 16:23:21 CET Jeremy Stanley pisze:
> > > > > On 2024-03-21 15:56:42 +0100 (+0100), Sven Kieske wrote:
> > > > > [...]
> > > > > > There must be a specific reason why bare rechecks are allowed at all?
> > > > > > Why don't we simply enforce that there always must be a reason given?
> > > > > >
> > > > > > Of course we can't enforce a meaningful reason being stated, but this
> > > > > > is already the case now, so it would not get worse if we just disabled
> > > > > > the possibility for bare rechecks, no?
> > > > > [...]
> > > > >
> > > > > There was a time when we did exactly that, it lasted several years
> > > > > and the end result did not yield any measurable improvement in data
> > > > > quality. In fact, at one point we got restrictive enough to require
> > > > > bug numbers and the outcome was that people either made up
> > > > > nonexistent bug numbers or just put in any old bug they knew the
> > > > > number for regardless of whether it was related to the failure.
> > > > >
> > > > > Yes it's been a while so I can't say for certain that the results
> > > > > would be the same if we tried again, but I don't have a good reason
> > > > > to believe it would turn out any different. Also, bear in mind, the
> > > > > pipeline trigger patterns apply to the entire Zuul tenant used by
> > > > > the OpenStack project, which is currently shared by any other
> > > > > projects outside OpenStack's governance, so if this change were
> > > > > enforced (again) it would disrupt their contributors' workflows as
> > > > > well.
> > > > > --
> > > > > Jeremy Stanley
> > > > >
> > > >
> > > > I agree with Jeremy here. We know that enforcing don't really work well
> > > > and that's why we are trying to educate more :)
> > > >
> > > > --
> > > > Slawek Kaplonski
> > > > Principal Software Engineer
> > > > Red Hat
> > >
> > >
> > >
> >
> >
>
>


Great! Some opinionated discussion and passionate responses, good to see that again.

I have one suggestion which, I think, would provide a pretty good indication of how useful effort this is, consuming anyone's time.

How about we start record somewhere, even a wiki page (oh wait, I think the wiki is dead, well somewhere anyways), a living document, about the bugs that got fixed due to non-mute recheck messages. I don't care if you write one-liner or descriptive paragraph, but a minimum of date, review link and some kind of explanation about how the recheck message got you to fix this bug and obviously the message itself. Now for those who like to keep the reminder in the team meetings have something to celebrate "Look, this actually works!" but if the page is still empty in a month's or two months' time, I guess we can all agree to focus our efforts on something more constructive and stop pestering people about it. Otherwise we might as well create a gerrit hook that recognizes the empty "recheck" and asks <insert your favourite LLM here> to supplement it with likely reason and get on with it.

- jokke