[openstack-dev] [nova] minimum review period for functional changes that break backwards compatibility

Day, Phil philip.day at hp.com
Mon Dec 30 19:17:58 UTC 2013


I wonder if at least part of the problem is that whilst we have prioritisation for bugs (via severity) and blueprints (via  approval and target release) that doesn't obviously carry through into gerrit.  If it was easier to see what we're high and low priory changes it might be easier to decide which need attention and which can / should wait for more input ?

At the moment or does feel that a changes chance of getting merged is somewhat random,  and we must be able to do better than that.

Of course we'd still need to work out how to prioritise changes which land add neither big or bp ( or maybe this is part of the argument for not having such changes)

Phil


Sent from  Mobile - spelling will be even worse than normal



-------- Original message --------
From: Robert Collins <robertc at robertcollins.net>
Date:
To: "OpenStack Development Mailing List (not for usage questions)" <openstack-dev at lists.openstack.org>
Subject: Re: [openstack-dev] [nova] minimum review period for functional changes that break backwards compatibility


On 29 December 2013 21:06, Day, Phil <philip.day at hp.com> wrote:

>> What is the minimum review period intended to accomplish? I mean:
>> everyone that reviewed this *knew* it changed a default, and that guest
>> OS's that did support ext3 but don't support ext4 would be broken.
>
> My point is that for some type of non-urgent change (i.e. those that change existing behaviour) there needs to be a longer period to make sure that more views and opinions can surface and be taken into account.   Maybe all the reviewers in this case did realise the full impact of this change, but that's still not the same thing as getting a wide range of input.   This is a change which has some significant impact, and there was no prior discussion as far as I know in the form of a BP or thread in the mailing list.   There was also no real urgency in getting the change merged.

I disagree that 'longer period' implies 'more views and opinions'.
>From the nova open reviews stats:
3rd quartile wait time: 17 days, 7 hours, 14 minutes

25% of *all* open nova reviews have had no review at all in 17 days.

3rd quartile wait time: 23 days, 10 hours, 44 minutes

25% of all open nova reviews have had no -1 or -2 in 23 days.

I'm not debating the merits of more views and opinions - Sean has
pointed out already that automation is better than us having to guess
at when things will or won't work. But even if you accept that more
views and opinions will help, there are over 100 reviews up with *no*
such opinions added already.

Lets say that something like the patch that triggered this went up for
review, and that we established a one month mininum review period for
such patches. There's a 25% chance it would hit 3 weeks with no input
at all. The *effective* time then that a one month minimum period
would set for it would be a week.

Once the volume of reviews needed exceeds a single reviewers capacity,
by definition some reviewers will not see some patches *at all*. At
that point it doesn't matter how long a patch waits, it will never hit
the front of the line for some reviewers unless we have super strict -
and careful - ordering on who reviews what. Which we don't have, and
can't get trivially. But even if we do:

-> time is not a good proxy for attention, care, detail or pretty much
any other metric when operating a scaled out human process.

What would make a good proxy metric for 'more views and opinions'? I
think asking for more cores to +2 such changes would do it. E.g. ask
for 4 +2's for backward incompatible changes unless they've gone
through the a release cycle of being deprecated/warned.

>> Would  you like to have seen a different judgement call - e.g.
>> 'Because this is a backward breaking change, it has to go through one release
>> of deprecation warning, and *then* can be made' ?
>>
>
>  Yep, I think that would be appropriate in this case.   There is an impact beyond just the GuestOS support that occurred to me since, but I don't want to get this thread bogged down in this specific change so I'll start a new thread for that.  My point is that where changes are proposed that affect the behaviour of the system, and especially where they are not urgent (i.e not high priority bug fixes) then we need to slow down the reviews and not assume that all possible views / impacts will surface in a few days.

Again, I really disagree with 'need to slow down'. We need to achieve
something *different*.

> As I said, there seems to me to be something wrong with the priority around changes when non urgent but behaviour changes go though in a few days but we have bug fixes sitting with many +1's for over a month.

>> Another possible reason is that we should have a strict no-exceptions-by-
>> default approach to backwards incompatible changes, even when there are
>> config settings to override them. Whatever the nub is - lets surface that and
>> target it.
>>
>
> Yep - I think we should have a very clear policy around how and when we make changes to default behaviour.   That's really the point I'm trying to surface.

Cool. So - lets focus on that, and get it added to the reviewer
checklist wiki page once there is consensus.

>> Basically, I'm not sure what problem you're trying to solve - lets tease that
>> out, and then talk about how to solve it. "Backwards incompatible change
>> landed" might be the problem - but since every reviewer knew it, having a
>> longer review period is clearly not connected to solving the problem :).
>>
>
> That assumes that a longer review period wouldn't of allowed more reviewers to provide input - and I'm arguing the opposite.   I also think that some clear guidelines might have led to the core reviewers holding this up for longer.   As I said in my original post, the intent to get more input was clear in the reviews, but the period wasn't IMO long enough to make sure all the folks who may have something to contribute could.   I'd rather see some established guidelines than have to be constantly on the lookout for changes every day or so and hoping to catch them in time.

I think it's pretty clear that a long review period *might* have
gathered more input but has no *guarantee* of gathering more input.
And if this is a thing we want to get right, 'might' has no place in
the discussion :)

>>
>> > The specific change that got me thinking about this is
>> > https://review.openstack.org/#/c/63209/ which changes the default fs
>> type
>> > from ext3 to ext4.    I agree with the comments in the commit message that
>> > ext4 is a much better filesystem, and it probably does make sense to
>> > move to that as the new default at some point, however there are some
>> > old OS's that may still be in use that don't support ext4.  By making
>> > this change to the
>>
>> Per above, these seem to be solely RHEL3 and RHEL4.
>
> And SLES.
>
> It also causes inconsistent behaviour in the system, as any existing "default" backing files will have ext3 in them, so a VM will now get either ext3 or 3ext4 depending on whether the host they get created on already has a backing file of the required size or not.
>
> I don't think the existing design ever considered the default FS changing - maybe we shouldn't have files that include "default" as the file system type if it can change over time, and the name should always reflect the FS type.

Agreed.

>> I'm extremely skeptical of 'wait longer' and 'use blueprints' as tools to get 'big
>> impact noticed': blueprints will get the PTL to see the change, but not all
>> reviewers. Waiting longer likewise: One could wait
>> 3 weeks and not get reviewed. If the existing system isn't working, doing
>> more of it will just not work more :).
>>
>
> But in this case the existing system was working - otherwise you'd have filed this as a bug right ;-)
> It may not have been working the way you wanted it to, but it was still working.

By working I meant 'achieving the goals' - and the goals you're
raising of a) being more careful with backwards incompatible changes
and b)  catching problems that affect guest OS support - are not being
met by the existing system.

>> We could talk about a strict FIFO approach to reviews, or a drive-WIP-to-zero
>> approach, but I think that such a discussion is best separated out from the
>> changes-like-this discussion, because there will also be backwards
>> incompatible changes which *are* urgent or whatever, and so these are
>> clearly different things to ensure our systems cope with. IMNSHO.
>>
>> -Rob
>
> Agree that there are always going to be backwards incompatible changes which *are* urgent.  This wasn't one on them - and IMELHO I don't buy the argument that just because they exist means we shouldn't have a different cadence for non-urgent changes.

I'm not talking about a different cadence though, I'm saying that if
we need to solve the problem for reviews that do need to move quickly,
then there is no need to solve the problem *differently* for reviews
that don't.

-Rob


--
Robert Collins <rbtcollins at hp.com>
Distinguished Technologist
HP Converged Cloud

_______________________________________________
OpenStack-dev mailing list
OpenStack-dev at lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



More information about the OpenStack-dev mailing list