[openstack-dev] [nova] minimum review period for functional changes that break backwards compatibility
philip.day at hp.com
Wed Jan 1 19:16:46 UTC 2014
Hi Sean, and Happy New Year :-)
> -----Original Message-----
> From: Sean Dague [mailto:sean at dague.net]
> Sent: 30 December 2013 22:05
> To: Day, Phil; OpenStack Development Mailing List (not for usage questions)
> Subject: Re: [openstack-dev] [nova] minimum review period for functional
> changes that break backwards compatibility
> On 12/29/2013 07:58 PM, Day, Phil wrote:
> > Hi Sean,
> > I'm not convinced the comparison to my clean shut down change is valid
> here. For sure that proved that beyond a certain point (in that case months)
> there is no additional value in extending the review period, and no amount
> of review will catch all problems, but that's not the same as saying that there
> is no value in a minimum period.
> The reason I brought up graceful shutdown was that I actually think that
> review went the opposite direction, and got worse over time.
> Borrowing from another part of Robert's thread -
> Minimum review time only works if all patches eventually see review, which
> they don't.
I agree that the controlled shutdown didn't improve over time, although I don’t think it got worse. It ironic in a way that the controlled shutdown issue is one which adversely affected the gate tests but wouldn’t I think of affected any real workloads (the problem is if you stop an instance immediately after its been created but before the GuestOS is running then it doesn’t see the shutdown signal and so waits for the full 2 minute period) whereas the ext4 change improves the gate tests but breaks some production systems.
However whilst I agree that too long is bad, I don’t think that's inconsistent with too short also being bad - it seems to me that there is probably some sweet spot between these two extremes that probably also depends on the type of change being proposed.
> > In this particular case if the review has been open for say three weeks then
> imo the issue would have been caught, as I spotted it as soon as I saw the
> merge. As it wasn't and urgent bug fix I don't see a major gain from not
> waiting even if there wasn't a problem.
> So... then why didn't you spot it on the submit? And would you have found
> the review on your own if it hadn't been for the merge commit email?
If I'd been working during the four days that the patch was open for review I'm confident that I would have spotted it - I make a point of looking out for any changes which might break our production system.
It wasn't the merge commit e-mail that made me notice it BTW, I made a point of looking at the recently merged changes in gerrit to see if there was anything significant that I'd missed.
But this thread wasn't created to get a review cadence that takes my holiday plans into account, it was more to open a discussion about are there different types of review strategies we should adopt to take into account the different types of changes that we now see. There was a time when Nova was a lot simpler, there were very few production deployments, there were a lot less active reviewers, and probably only the cores new all aspects of the system, and so two +2 and some +1 was enough to ensure a thorough review. I'd suggest that things are different now and it would be worthwhile to identify a few characteristics and see if there isn't scope for a few different types of merge criteria. For example (and I'd really like to hear other ideas and suggestions here) perhaps changes could be chacraterised / priorisied by one or more of the following:
- High priority bug fixes: obvious why these need to be reviewed and merged quickly - two +2s is sufficient
- Approved BPs targeted for a specific release: By being approved the design has had a level of scrutiny, and it's important to consumers of nova that the release roadmap is as reliable as possible, so these do need to get attention. However a lot of design detail often only emerges in the code, so these should be open to more scrutiny. Maybe there should be an additional soft limit of at least 4 or 5 +1's
- Normal bug fixes: Need to keep this ticking through, but should pause to make sure there is a reprentative range of reviews, for example a soft limit of at least 3 +1's
- Changes in default behaviour: Always likely to affect existing systems in some way. Maybe we should have an additional type of review vote that comes from people who are recognised as reperensting large production deployments ?
> It was urgent from a tripleo perspective, enough so that they were carrying
> an out of tree patch for it until it merged. Remember, we're trying to get
> tripleo, eventually, gating. And 45 mins deploy times was a big fix to move
> that ball forward. That's why I prioritized that.
> So while it was a low priority change for your environment, don't assume that
> it wasn't really needed as part of CI.
> > I'm all for continually improving the gate tests, but in this case they would
> need to be testing against a system that had been running before the
> change, to test specifically that new vms got the new fs, so there would have
> needed to be a matching test added to grenade as part of the same commit.
> > Not quite sure where the number of open changes comes in either, just
> > because there are a lot of proposed changes doesn't to me mean we need
> > to rush the non urgent ones, it mwans we maybe need better
> > prioritisation. There are plenty of long lived buf fixes siting with
> > many +1s
> Again, you keep assuming this was non urgent, and your assumptions that no
> one should have been looking at it were because of that. However this was
> effectively communicated by Robert a high priority issue on trippleo CD
> testing, enough so that they were going to run a nova fork until it merged, so
> it was treated as such.
> So please step back from the assumption that this was a non urgent change,
> because the tone in IRC was anything but. So it was treated as something
> that needed to move fast. The fact that it was a super small change, saw no
> negative reviews, no negative email comments, means I'm not sure we
> would have done any better. The point was, with no negative feedback it
> didn't really occur to anyone that it was a backwards compatibility problem. I
> was *actually* looking for someone to say something about why it would be
> a problem, and there was nothing. So I assumed we had exhausted the set
> of issues.
I agree and have already acknowledged that you did wait to see if there were any negative comments, but it does seem that you then made the assumption that the one person who responded with a+1 before you gave it a +2 meant that others were either not interested, or had looked and decided to neither +1 or -1 the change. I think (and hindsight is always wonderful) that waiting for feedback should mean waiting for a number of reviews, or as in the case with Neutron type changes, waiting for reviews from particular groups of reviewers.
> So you are really making 2 assumptions here that aren't valid:
> * it was known to be a backwards compatibility problem - because it wasn't,
> and no one provided feedback over the course of 4 days to indicate that it
> was (there were some alternative implementation ideas, but no one said
> "hold on, this could break X")
That's not really an assumption - I'm saying it would have been spotted had there been more reviews because I know would have spotted it, as I think would some of my collegues. We've been looking at a number of problems with backing files recently, including the security fix we posted a week or so ago, and have been working on how to configure some of our images to use ext4 - so it was an area that was fresh in our minds.
> * it wasn't urgent - and like I said the tone really was towards this being an
> important need for trippleo
I'm assuming (or maybe asserting would be a better term) that it wasn't urgent because:
- Nothing in the commit message suggests its urgent - it describes it more as a major performance enhancement
- Its not linked to a high priority bug or approved blueprint
- The capability to use ext4 already existed via configuration options. If this was needed for the TripelO gate tests (and again no mention of this in the commit itself) then why couldn’t the tests of been configured to use ext4 rather than changing the default ?
So no sorry - I'm not willing yet to back away from my assumption that this wasn't an urgently needed fix. especially because of the last point.
It may have been the tone in IRC, but should I really be expected to look there to understand the background to a submitted change ?
> Which is why I don't think saying 3 week minimum review time is helpful,
> because if any of the reviewers had thought it was backwards incompatible,
> they would have -1ed it. Humans only get so much of that right.
I think I agree that a time limit is the wrong way to go - as in this case the number of active reviewers is very variable throughout the year. Something which was more around number and type of reviewers (and maybe we need a distinction now that goes beyond just core and non-core) would probably be more workable.
> So we need to handle this defensively and beef up the automated systems.
> Does that require more people adding in defensive tests to those scenarios?
> definitely. The fact that right now basically Dean and I are the only ones that
> understand our upgrade testing (with a few others
> bootstrapping) is way too few people touching that problem.
> If seemless upgrade is the most important thing to people, we need more
> people in the trenches on the tooling that helps us verify that.
As I said before, I'm not arguing against improving the automated tests, but of course we also have to be cognisant that extending the time take to run tests is in itself a problem. And however much effort we can divert into adding tests it will take time before that gets us improved coverage of this kind of guest / upgrade issue.
In the meantime as you and Rob have pointed out the review process clearly isn’t working as well as we'd all like it to, with some changes getting no reviews and others some sitting for weeks with many +1's (for example here's a bug fix with seven +1's that has been stuck since 6th Dec https://review.openstack.org/#/c/56258/ ) - none of which seems to be really correlated with the relative priority of the proposed changes. So I'm suggest that that we look at what can be done to improve the review process now while the automated testing is improved.
More information about the OpenStack-dev