[openstack-dev] [nova] bug triage experimentation

Sylvain Bauza sbauza at redhat.com
Mon Jun 26 08:49:47 UTC 2017



Le 23/06/2017 18:52, Sean Dague a écrit :
> The Nova bug backlog is just over 800 open bugs, which while
> historically not terrible, remains too large to be collectively usable
> to figure out where things stand. We've had a few recent issues where we
> just happened to discover upgrade bugs filed 4 months ago that needed
> fixes and backports.
> 
> Historically we've tried to just solve the bug backlog with volunteers.
> We've had many a brave person dive into here, and burn out after 4 - 6
> months. And we're currently without a bug lead. Having done a big giant
> purge in the past
> (http://lists.openstack.org/pipermail/openstack-dev/2014-September/046517.html)
> I know how daunting this all can be.
> 
> I don't think that people can currently solve the bug triage problem at
> the current workload that it creates. We've got to reduce the smart
> human part of that workload.
> 

Thanks for sharing ideas, Sean.

> But, I think that we can also learn some lessons from what active github
> projects do.
> 
> #1 Bot away bad states
> 
> There are known bad states of bugs - In Progress with no open patch,
> Assigned but not In Progress. We can just bot these away with scripts.
> Even better would be to react immediately on bugs like those, that helps
> to train folks how to use our workflow. I've got some starter scripts
> for this up at - https://github.com/sdague/nova-bug-tools
> 

Sometimes, I had no idea why but I noticed the Gerrit hook not working
(ie. amending the Launchpad bug with the Gerrit URL) so some of the bugs
I was looking for were actively being worked on (and I had the same
experience myself although my commit msg was pretty correctly marked AFAIR).

Either way, what you propose sounds reasonable to me. If you care about
fixing a bug by putting yourself owner of that bug, that also means you
engage yourself on a resolution sooner than later (even if I do fail
applying that to myself...).

> #2 Use tag based workflow
> 
> One lesson from github projects, is the github tracker has no workflow.
> Issues are openned or closed. Workflow has to be invented by every team
> based on a set of tags. Sometimes that's annoying, but often times it's
> super handy, because it allows the tracker to change workflows and not
> try to change the meaning of things like "Confirmed vs. Triaged" in your
> mind.
> 
> We can probably tag for information we know we need at lot easier. I'm
> considering something like
> 
> * needs.system-version
> * needs.openstack-version
> * needs.logs
> * needs.subteam-feedback
> * has.system-version
> * has.openstack-version
> * has.reproduce
> 
> Some of these a bot can process the text on and tell if that info was
> provided, and comment how to provide the updated info. Some of this
> would be human, but with official tags, it would probably help.
> 

The tags you propose seem to me related to an "Incomplete" vs.
"Confirmed" state of the bug.

If I'm not able to triage the bug because I'm missing information like
the release version or more logs, I put the bug as Incomplete.
I could add those tags, but I don't see where a programmatical approach
could help us.

If I understand correctly, you're rather trying to identify more what's
missing in the bug report to provide a clear path of resolution, so we
could mark the bug as Triaged, right? If so, I'd not propose those tags
for the reason I just said, but rather other tags like (disclaimer, I
suck at naming things):

 - rootcause.found
 - needs.rootcause.analysis
 - is.regression
 - reproduced.locally


> #3 machine assisted functional tagging
> 
> I'm playing around with some things that might be useful in mapping new
> bugs into existing functional buckets like: libvirt, volumes, etc. We'll
> see how useful it ends up being.
> 

Logs parsing could certainly help. If someone is able to provide a clear
stacktrace of some root exception, we can get for free the impact
functional bucket for 80% of cases.

I'm not fan of identifying a domain by text recognition (like that's not
because someone tells about libvirt that this is a libvirt bug tho), so
that's why I'd see more some logs analysis like I mentioned.


> #4 reporting on smaller slices
> 
> Build some tooling to report on the status and change over time of bugs
> under various tags. This will help visualize how we are doing
> (hopefully) and where the biggest piles of issues are.
> 
> The intent is the normal unit of interaction would be one of these
> smaller piles. Be they the 76 libvirt bugs, 61 volumes bugs, or 36
> vmware bugs. It would also highlight the rates of change in these piles,
> and what's getting attention and what is not.
> 

I do wonder if Markus already wrote such reporting tools. AFAIR, he had
a couple of very interesting reportings (and he also worked hard on the
bugs taxonomy) so we could potentially leverage those.

-Sylvain

> 
> This is going to be kind of an ongoing experiment, but as we currently
> have no one spear heading bug triage, it seemed like a good time to try
> this out.
> 
> Comments and other suggestions are welcomed. The tooling will have the
> nova flow in mind, but I'm trying to make it so it takes a project name
> as params on all the scripts, so anyone can use it. It's a little hack
> and slash right now to discover what the right patterns are.
> 
> 	-Sean
> 



More information about the OpenStack-dev mailing list