[openstack-dev] [nova] minimum review period for functional changes that break backwards compatibility

Day, Phil philip.day at hp.com
Sun Dec 29 08:06:23 UTC 2013


> -----Original Message-----
> From: Robert Collins [mailto:robertc at robertcollins.net]
> Sent: 29 December 2013 06:50
> To: OpenStack Development Mailing List (not for usage questions)
> Subject: Re: [openstack-dev] [nova] minimum review period for functional
> changes that break backwards compatibility
> 
> On 29 December 2013 04:21, Day, Phil <philip.day at hp.com> wrote:
> > Hi Folks,
> >
> >
> >
> > I know it may seem odd to be arguing for slowing down a part of the
> > review process, but I'd like to float the idea that there should be a
> > minimum review period for patches that change existing functionality
> > in a way that isn't backwards compatible.
> 
> What is the minimum review period intended to accomplish? I mean:
> everyone that reviewed this *knew* it changed a default, and that guest
> OS's that did support ext3 but don't support ext4 would be broken. 

My point is that for some type of non-urgent change (i.e. those that change existing behaviour) there needs to be a longer period to make sure that more views and opinions can surface and be taken into account.   Maybe all the reviewers in this case did realise the full impact of this change, but that's still not the same thing as getting a wide range of input.   This is a change which has some significant impact, and there was no prior discussion as far as I know in the form of a BP or thread in the mailing list.   There was also no real urgency in getting the change merged.

> Would  you like to have seen a different judgement call - e.g.
> 'Because this is a backward breaking change, it has to go through one release
> of deprecation warning, and *then* can be made' ?
> 

 Yep, I think that would be appropriate in this case.   There is an impact beyond just the GuestOS support that occurred to me since, but I don't want to get this thread bogged down in this specific change so I'll start a new thread for that.  My point is that where changes are proposed that affect the behaviour of the system, and especially where they are not urgent (i.e not high priority bug fixes) then we need to slow down the reviews and not assume that all possible views / impacts will surface in a few days. 

As I said, there seems to me to be something wrong with the priority around changes when non urgent but behaviour changes go though in a few days but we have bug fixes sitting with many +1's for over a month.

> One possible reason to want a different judgment call is that the logic about
> impacted OS's was wrong - I claimed (correctly) that every OS has support for
> ext4, but neglected to consider the 13 year lifespan of RHEL...
> https://access.redhat.com/site/support/policy/updates/errata/ shows that
> RHEL 3 and 4 are both still supported, and neither support ext4.
> So folk that are running apps in those legacy environments indeed cannot
> move.
> 

Yep - that's part of my concern for this specific change.   Its an example of the kind of detail that I think would have emerged from a longer review cycle (at least I know I would have flagged it if I'd had the chance to ;-)

> Another possible reason is that we should have a strict no-exceptions-by-
> default approach to backwards incompatible changes, even when there are
> config settings to override them. Whatever the nub is - lets surface that and
> target it.
> 

Yep - I think we should have a very clear policy around how and when we make changes to default behaviour.   That's really the point I'm trying to surface.

> Basically, I'm not sure what problem you're trying to solve - lets tease that
> out, and then talk about how to solve it. "Backwards incompatible change
> landed" might be the problem - but since every reviewer knew it, having a
> longer review period is clearly not connected to solving the problem :).
> 

That assumes that a longer review period wouldn't of allowed more reviewers to provide input - and I'm arguing the opposite.   I also think that some clear guidelines might have led to the core reviewers holding this up for longer.   As I said in my original post, the intent to get more input was clear in the reviews, but the period wasn't IMO long enough to make sure all the folks who may have something to contribute could.   I'd rather see some established guidelines than have to be constantly on the lookout for changes every day or so and hoping to catch them in time.   

> 
> > The specific change that got me thinking about this is
> > https://review.openstack.org/#/c/63209/ which changes the default fs
> type
> > from ext3 to ext4.    I agree with the comments in the commit message that
> > ext4 is a much better filesystem, and it probably does make sense to
> > move to that as the new default at some point, however there are some
> > old OS's that may still be in use that don't support ext4.  By making
> > this change to the
> 
> Per above, these seem to be solely RHEL3 and RHEL4.

And SLES.

It also causes inconsistent behaviour in the system, as any existing "default" backing files will have ext3 in them, so a VM will now get either ext3 or 3ext4 depending on whether the host they get created on already has a backing file of the required size or not.  

I don't think the existing design ever considered the default FS changing - maybe we shouldn't have files that include "default" as the file system type if it can change over time, and the name should always reflect the FS type. 

> 
> > default without any significant notification period this change has
> > the potential to brake existing images and snapshots.  It was already
> > possible to use ext4 via existing configuration values, so there was
> > no urgency to this change (and no urgency implied in the commit
> > messages, which is neither a bug or blueprint).
> 
> Indeed - the reason for putting up the change was the positive reception on
> the list. If the change was requested to wait, we would have ensured there
> was a bug (because running non-default for no good reason is a bug in
> TripleO+the component whose default is wrong) used the config setting, and
> moved on.
> 
> > I'm not trying to pick out the folks involved in this change in
> > particular,
> 
> I don't feel picked out :).

Good ;-)


> 
> > it just happened to serve as a good and convenient example of
> > something that I think we need to be more aware of and think about
> > having some specific policy around.  On the plus side the reviewers
> > did say they would wait 24 hours to see if anyone objected, and the
> > actual review went over 4 days - but I'd suggest that is still far too
> > quick even in a non-holiday period for something which is low priority
> > (the functionality could already be achieved via existing
> > configuration options) and which is a change in default behaviour.  (In the
> period around a major holiday there probable needs to be
> > an even longer wait).     I know there are those that don't want to see
> > blueprints for every minor functional change to the system, but maybe
> > this is a case where a blueprint being proposed and reviewed may have
> caught the
> > impact of the change.    With a number of people now using a continual
> 
> I'm extremely skeptical of 'wait longer' and 'use blueprints' as tools to get 'big
> impact noticed': blueprints will get the PTL to see the change, but not all
> reviewers. Waiting longer likewise: One could wait
> 3 weeks and not get reviewed. If the existing system isn't working, doing
> more of it will just not work more :).
> 

But in this case the existing system was working - otherwise you'd have filed this as a bug right ;-)
It may not have been working the way you wanted it to, but it was still working.


> > deployment approach any change in default behaviour needs to be
> > considered not just  for the benefits it brings but what it might
> > break.  The advantage we have as a community is that there are lot of
> > different perspectives that can be brought to bear on the impact of
> > functional changes, but we equally have to make sure there is sufficient
> time for those perspectives to emerge.
> 
> Sure. What does this break though? Specifically, from this mail and the
> research it's prompted me to do I can see that RHEL3 and RHEL4 users would
> stop having ephemeral work *by default*. They can still work by requesting
> an ext3 drive from their cloud provider.

It introduces a new requirement for GuestOS's to have to support ext4 in order to work with the default configuration.   I think that's a significant enough change that it needs to be flagged, discussed, and planned.


> 
> > Somehow it feels that we're getting the priorities on reviews wrong
> > when a low priority changes like this which can  go through in a
> > matter of days, when there are bug fixes such as
> > https://review.openstack.org/#/c/57708/
> > which have been sitting for over a month with a number of +1's which
> > don't seem to be making any progress.
> 
> We could talk about a strict FIFO approach to reviews, or a drive-WIP-to-zero
> approach, but I think that such a discussion is best separated out from the
> changes-like-this discussion, because there will also be backwards
> incompatible changes which *are* urgent or whatever, and so these are
> clearly different things to ensure our systems cope with. IMNSHO.
> 
> -Rob

Agree that there are always going to be backwards incompatible changes which *are* urgent.  This wasn't one on them - and IMELHO I don't buy the argument that just because they exist means we shouldn't have a different cadence for non-urgent changes.  

Phil




More information about the OpenStack-dev mailing list