[Product] proposal for expediting bug fixes

Steve Gordon sgordon at redhat.com
Tue Sep 29 20:51:30 UTC 2015


----- Original Message -----
> From: "Rochelle Grober" <rochelle.grober at huawei.com>
> To: "Kyle Mestery" <mestery at mestery.com>
> 
> Again, this response is a to an earlier branch of the discussion, but I
> wanted to address some ongoing discussions both inline, and for the other
> branch of the thread, top posted.  I think this thread with its branches is
> a great example of how working as a sustaining engineer in the corporate
> world can be frustrating ;-)
> 
> On the stable branch bugfix velocity -- yes, by adding the resources under
> discussion, we will be able to get the backports of bugfixes done much more
> effectively than is currently the case.  That said, there has been
> discussion in the past of lack of support for the stable branches from the
> community, which resulted in the action to engage all project cores who were
> willing in the effort to review the backports for the stable branches.  So,
> there is definitely an awareness of stable branch maintenance.  Stable
> branche processes/velocity are *definitely* one of the areas that the
> Product WG should be trying to help improve.
> 
> Another discussion on the dev mailing list was around dropping the tagged
> releases all together in favor of continuous rolling releases.  The modified
> process would go a long way in speeding access to backported bug fixes to
> the sustaining engineering teams supporting OpenStack products.  We should
> all review that thread (I'll eventually find it and post it) and perhaps
> restart the discussion once Liberty is released.
> 
> So, more in line below.  And, this is a great discussion, I think.
> 
> --rocky
> 
> 
> From: Kyle Mestery [mailto:mestery at mestery.com]
> Sent: Monday, September 28, 2015 7:01 PM
> To: Rochelle Grober
> Cc: Arkady_Kanevsky at dell.com; product-wg at lists.openstack.org; Armando
> Migliaccio
> Subject: Re: [Product] proposal for expediting bug fixes
> 
> On Mon, Sep 28, 2015 at 8:35 PM, Rochelle Grober
> <rochelle.grober at huawei.com<mailto:rochelle.grober at huawei.com>> wrote:
> Hey, guys, I'm late to this train, but I hope you appreciate my cargo;-)
> 
> By all means. :)
> 
> Kyle Mestery wrote:
> > On Mon, Sep 28, 2015 at 5:03 PM,
> > <Arkady_Kanevsky at dell.com<mailto:Arkady_Kanevsky at dell.com>> wrote:
> >
> > > Problem: Currently bug fixes take a long time especially for previous
> > > releases.
> > >
> > >
> > This proposal has multiple problems which I'll address below. But the
> > biggest issue is it appears to try and "anoint" people into a certain
> > role.
> > This goes against Open Source philosophies, you have to show up and do
> > the
> > legwork to get that sort of recognition. You can't force this onto
> > project
> > teams. See below for specific issues with this proposal.
> 
> Kyle, we very much appreciate your comments and here, have a beer while I try
> to calm you down....
> 
> I'm only trying to prevent the train from leaving the tracks before it leaves
> the station.
> 
> First, to address Armando's comment (later in this thread :-/):
> 
> 0. The identified resources will actively participate in rapid triage of
> bugs, and when necessary for a bug identified as high priority (or unknown
> for lack of information), will attempt to collect the necessary information
> to reproduce the problem and/or solve it.
> 
> Yes, if we are not involved in the triage, nothing is going to speed up.  In
> fact, I would consider that a product or project engineer would want to be
> triaging bugs almost immediately upon system entry.  The product engineer
> should have a good enough understanding of the customers' and their needs,
> and the developers and their needs that the engineer can ensure both the
> priority and reproducibility (to a very large extent, but they're not
> perfect) of the filed bug.
> 
> 
> > > Proposal:
> > >
> > > 1.       For each project Product WG identify a couple of people ( at
> > > least 3 overall to handle vacations and a like) among its member
> > companies
> > > who will have responsibility to review submitted bug fixes (for all
> > > releases) within 1 day
> > >
> > >
> > Are you saying that for each code submission for a bug fix, you want
> > these
> > people to review the code within one day? What type of review? And if
> > it's
> > a simple +1 review, it won't be helpful. There is no timetable for how
> > long
> > a bug takes to land, the code review is only part of the process. The
> > submitter has to iterate, the gate has to digest the change, etc.
> >
> Where we are heading with this is:  The Product WG will identify corporate
> resources who will be tasked with reviewing bug fixes when they have been
> submitted and passed jenkins checks.  These resources will be familiar with
> the code base and the OpenStack community, its processes and their
> responsibilities.  The Product WG hopes that by decreasing the time before
> first review of bug fixes, and increasing the number of critical eyes on the
> bug fixes, it will help to speed the merge process.
> 
> It is expected that as these extra reviewers demonstrate their skills and
> capabilities, they will be accepted into the trust circles of the projects
> and their reviews will bear enough weight to enable quicker review cycles.
> 
> Right, this is how it works. But this is how you gain entry into the circle
> of trust, there's nothing special required for this role. And there is no
> known quantity in how long this takes. Further, this is now diverging from
> the original thread and appears to be now moving to a "how do I become a
> core reviewer" type of thread. There are plenty of writeups on how this is
> done already, please see here [1] for Neutron, here [2] for Nova, and I'm
> sure there are more examples.
> 
> [1] http://docs.openstack.org/developer/neutron/policies/core-reviewers.html
> [2] https://wiki.openstack.org/wiki/Nova/CoreTeam
> So, Kyle, The first pass was wish.  We, as a group, intend to work on this
> until it is in a state acceptable enough by the community that discussion
> can ensue on how to implement this.
> 
> That's my point, this is going to be different for each project, it's not a
> one size fits all. Trying to enforce this down each project will result in
> different reactions. Even further, the best way for this to happen is to
> train your developers and have them engage the community, work upstream,
> submit patches, review, etc. This really seems like you're trying to come up
> with a "how to be a core reviewer" training of sorts, which may be useful,
> but certainly isn't quite aligned with what Arkady originally sent out.
> 
> [Rockyg]
> We aren't trying to be "core reviewers" here.  In the corporate world, these
> engineers would be referred to as "Sustaining Engineers".  They address bugs
> found in the wild, reproduce them and feed back the solutions to Dev and
> Stable.  What our engineers have found in supporting our OpenStack products
> is that the bug fix and backport process is extremely long, especially in
> Nova, and that very often, bug fixes languish unreviewed by core in Master
> for quite a while, and often take even longer to get backported.  What we
> want is to provide our sustaining engineers an avenue in the OpenStack
> community to more effectively do their jobs on their OpenStack products,
> while increasing the velocity of bug fixes and backports in the OpenStack
> community.  If you want to view them as cores, view them as bug fix/stable
> cores.
> 
> 
> >
> > > 2.       These people will be added automatically as reviewers for
> > each
> > > bug patch submission.
> 
> These reviewers will add the project's bugs to their watched lists so as to
> know when a review has been submitted.
> 
> > >
> > > a.       Need some help with tool people to automatically add these
> > people
> > > as reviewers for newly filed bugs
> > >
> > >
> > Why do you want to automatically add people to these reviews? We have
> > gerrit dashboards which they can use to do this themselves.
> 
> Launchpad has the tools to be notified of bugs.  Gerrit has the tools to
> watch projects.  A script or two could likely be written to only show
> reviews associated with bugs.
> 
> 
> > > 3.       Once a patch has two +1 ones from these reviewers and no -1s
> > from
> > > anybody else, project PTL or project designated bug tsar will merge
> > it if
> > > it passes his/her review within 2 days
> > >
> > >
> > This ain't gonna happen. This is not how code is merged in OpenStack.
> > If
> > this is a stable backport, we have a stable review team which does this
> > already [1]. If it's for master, it's even more insane to think PTLs
> > will
> > merge code with +1s from some folks. The process for merging code is
> > documented here [2], I encourage you to read that and understand it in
> > detail.
> 
> Once a patch meets the standards of a collection (specified by number of devs
> doing priority reviews? size of project?  some way to say "well if xx people
> think it's ready....) of reviewers, it is highlighted in either the project
> weekly meeting or on IRC for core review attention.
> 
> I can't speak for others, but for Neutron, we already highlight bugs each
> week [3]. And we have an hour for the entire meeting, so we only highlight
> "High" or "Critical" bugs. It's a bit of gray area for what gets
> highlighted, hard and fast rules are not so easy to apply. Even further, as
> a former PTL I can attest to the fact that just by highlighting a bug
> doesn't mean it gets attention.
> 
> [3] https://wiki.openstack.org/wiki/Network/Meetings#Bugs
> 
> [Rockyg] Yes, you've done an amazing job getting Neutron organized to run
> smoothly, address issues in time sensitive and ongoing bases and rebuilding
> trust, along with growing the organization immensely at the same time.
> Product WG would like to help spread your good processes to other projects,
> and this is one of our efforts to make the projects aware of some of the
> best practices of effective projects.
> 
> 
> 
> >
> > > a.       Need to agree on the mechanism to let PTL or designate know
> > of
> > > patch readiness. Maybe one of the reviewers adds PTL or designate to
> > the
> > > patch when passing baton criteria is met. (Best to run by PTLs and TC
> > for
> > > recommendation).
> > >
> > >
> > It sounds to me like you're basically doing an end-run around the
> > current
> > stable team [1]. Can you explain why the current stable review process
> > isn't enough for your needs? Now, if I've mistaken this and you expect
> > this
> > behavior for master branch changes, then forgive me, but this again
> > goes
> > against the current workflow for merging patches each project has.
> >
> > [1] http://docs.openstack.org/project-team-guide/stable-branches.html
> > [2] http://docs.openstack.org/infra/manual/developers.html
> 
> So, yeah, we don't want to devise a new process, we just want to improve
> visibility to speed the existing, working but slow process.  But the key
> here is to raise the flag to let cores know that a patch is in pretty good
> shape.
> 
> I'm still confused: Is this for stable or master?
> 
> [Rockyg] Both.  As has been stated elsewhere in this thread, we need the bugs
> fixed in Master first, if they still exist there, before they can be
> backported to the stable releases.
> 
> 
> And, no Kyle, we don't want to usurp any developers' rights or duties or
> impose or demand our needs be met, but we want to document a way in which
> corporate managers can assign or direct resources specifically for bug
> fixing and bug fix reviews (both master and stable) so as to reduce backlog,
> improve velocity of merges, and address user issues.  But, we start with a
> description in a language we are familiar with (Arkady's proposal) and
> translate it into OpenStack language with the correct intent.
> 
> Does this help?
> 
> 
> You've basically saying you want to write a "How to be a core developer"
> training program, which is very different than what the original email
> indicates. Your training *could* lead to solving the issues you want, but it
> may not.
> [Rockyg] Yes and no.  We want to improve the velocity of getting bug fixes
> into community supported releases to reduce the need to maintain branches of
> OpenStack in support of our customers.  Bugfix specialists are already on
> the job in our companies.  Let's utilize them in a way they get to reduce
> their frustration by reducing the number of patches they need to host in
> their own CI systems.  These engineers are focused on bugfixes, not new
> development.  So they aren't interested in being core "developers" so much
> as core  bugfixers.

What Kyle is ultimately getting at is that the two are one and the same. Developers who review lots of code regardless of whether it's on master or stable/*, and regardless of whether it's bug fixes or features, will become trusted and ultimately make it into the core review teams and/or the stable-maint core teams for the projects they engage.

I guess a question I might ask is what is stopping them from doing this today? I think identifying specific, quantifiable, pain points in https://wiki.openstack.org/wiki/StableBranch and detailing how this team can help with each of them directly would be much more likely to succeed than the broad proposal in this thread put forward without any reference to existing processes.

As an aside for anyone who wants to dig into building out some actual data on how long it takes from submission to merge in master vs stable for the projects you should be able to get the info out of the notes ref: 

    https://gerrit-documentation.googlecode.com/svn/Documentation/2.5.2/refs-notes-review.html

E.g. (this is just a random example I pulled from git log):

commit 2e731ebc9dd4890b6bfa57ceaaa6dadba2e4b1c2
Author: Rajesh Tailor <rajesh.tailor at nttdata.com>
Date:   Thu Sep 10 23:15:46 2015 -0700

    Remove unnecessary 'context' param from quotas reserve method call
    
    'context' parameter was removed from quota-related method signatures in patch [1].
    In patch [2] use of 'context' parameter was removed from quota-related method calls.
    
    Still at some places 'context' parameter was passed to quota reserve method call.
    
    Removed use of 'context' parameter from those places and passed 'context' as kwargs
    while creating object of nova.objects.Quotas type.
    
    [1] https://review.openstack.org/#/c/164268/
    [2] https://review.openstack.org/#/c/160499/
    
    Backport notice:
    On restart of nova-compute if there is instance with vm_state as
    'DELETED' but that instance is not marked as deleted in database. In that case,
    while calling _complete_partial_deletion method from _init_instance method
    it raises " ValueError: Circular reference detected" error.
    
    Removing 'context' parameter from quotas.reserve method call in
    _complete_partial_deletion method addresses this issue as well.
    
    Change-Id: I20ea77bd20f093ce1be7169c0647cdc7ff62117c
    Closes-Bug: 1494653
    (cherry picked from commit 4031272fe93d1046cff4c0cc788c8e78db944419)

Notes (review):
    Verified+2: Jenkins
    Code-Review+2: garyk <gkotton at vmware.com>
    Code-Review+2: Claudiu Belu <cbelu at cloudbasesolutions.com>
    Workflow+1: Claudiu Belu <cbelu at cloudbasesolutions.com>
    Code-Review+1: IanSun <sun.jun at 99cloud.net>
    Submitted-by: Jenkins
    Submitted-at: Tue, 29 Sep 2015 16:54:34 +0000
    Reviewed-on: https://review.openstack.org/223968
    Project: openstack/nova
    Branch: refs/heads/stable/kilo

Thanks,

Steve



More information about the Product-wg mailing list