[openstack-dev] [Nova][Keystone] The unbearable lightness of specs

John Garbutt john at johngarbutt.com
Thu Jun 25 09:43:34 UTC 2015

On 25 June 2015 at 10:19, Daniel P. Berrange <berrange at redhat.com> wrote:
> On Wed, Jun 24, 2015 at 06:30:06PM -0700, James Bottomley wrote:
>> On Wed, 2015-06-24 at 17:25 +0100, Daniel P. Berrange wrote:
>> > On Wed, Jun 24, 2015 at 11:52:37AM -0400, Adam Young wrote:
>> > > On 06/24/2015 06:28 AM, Nikola Đipanov wrote:
>> > > >Gerrit and our spec template are a horrible tool for
>> > > >discussing design.
>> > > This is the heart of the problem.
>> > >
>> > >
>> > > I think that a proper RFE description in the bug tracker is the best place
>> > > to start.  Not a design of the solution, but a statement of the problem.
>> > >
>> > > Then, the rest of the discussion should take place in the code. Keystoen has
>> > > the Docs right in the code, as do, I think, every other project.  Don't sign
>> > > off on a patch for a major feature unless the docs have been updated to
>> > > explain that feature.  It will keep us from bike shedding about Database
>> > > schemas.
>> >
>> > What you are describing is sounds like the situation that existed before
>> > the specs concept was introduced. We had a statement of problem in the
>> > blueprint, and then people argued over the design in the code reviews.
>> > It really didn't work at all - code reviews are too late in the workflow
>> > to start discussions around the design, as people are already invested
>> > in dev work at that point and get very upset when you then tell them
>> > to throw away their work. Which happened repeatedly. You could say that
>> > the first patch submitted to the code repository should simply be a doc
>> > file addition, that describes the feature proposal and we should discuss
>> > that before then submitting code patches, but then that's essentially
>> > just the specs again, but with the spec doc in the main nova.git instead
>> > of nova-specs.git.
>> I don't think you meant this as a don't do it, just an experience point:
>> you're saying *we* couldn't make the process work, but that doesn't mean
>> that the process itself (specless code reviews) can't be made to work.
>> It works fine for a lot of open source projects, so the process must be
>> workable.  However, what the experience has shown is that there's a
>> bottleneck which isn't removed simply by removing the spec process.
>> On the spec question, the reason projects like the Linux Kernel are so
>> code driven is that with code it's harder to block a submission on
>> esoteric grounds, whereas with no code it is easier to argue endlessly
>> about minutiae.  I think this might be the reason for the "lightness"
>> Nikola complains about: the less you put in a spec, the less reason
>> people have to weigh in on it and delay its approval.  Perhaps an
>> appropriate question is: "is that fear well founded or just anecdotal?"
> While I agree with the broad sentiment, I feel Linux has some flexibility
> in code design that Nova & openstack projects in general do not. When it
> comes to additions to the userspace API, the Linux design debates are just
> as long and detailed as I've seen in Nova, for the reason that once you
> decide in a userspace ABI you commit to it forever. Now in Nova, there
> are quite alot of areas where we have such considerations - the RPC
> interface between Nova services is a stable ABI, the definition of
> the data structures passed over RPC is a stable ABI, the rest interface
> is a stable ABI, the controls (image props & flavor extra specs) are
> long term APIs.
> In general with Linux, as long as you are not touching userspace ABI,
> you have alot more flexibility to accept design mistakes, as they can
> be corrected at will later without back compatibility issues in general.
> With Nova, a very very large portion (more than the the majority IME)
> of new features are impacting on the public API or ABIs that we have
> to maintain long term. We've done a pretty awful job at getting those
> right in the past, and this is one of the things that the specs process
> has allowed us to significantly improve on.  This has certainly resulted
> in long drawn out design decisions on specs which has delayed code, but
> we have to be careful to balance the need for correct ABI design, with
> the need to not artificially delay developers.


We "release" every commit, in terms of REST API, and "upgradability".
Thats not a small cost. Although pretending we would go back and fix
it later seems much worse to me.

> I don't think we have that balance quite right, as we do tend to delay
> stuff more than is needed, partly due to review bandwidth constraints.


We are attempting to be more upfront with folks about what has no hope
of getting merged in a particular cycle, due to dependencies, or pure
review bandwidth.

This was in response to complaints that we blocked people after they
spent six weeks of keeping their code rebased, saying there is some
priority thing that is blocking them.

Is this working? Well, not really.

>> If I look at what the above says about the main issue that doesn't get
>> solved by removing the spec process, it's review pressure: how do you
>> increase the throughput of approval/rejection.  Note I didn't advocate
>> more reviews: The metric for success should be time to resolution
>> (accept/reject), not the number of reviews, if we ask everyone to double
>> their time spent reviewing and the net result of this is that the
>> average number of reviews to resolution doubles, nothing is gained.  So
>> perhaps it's time to actually measure the number of reviews to
>> resolution and look at ways to reduce this metric.  That will make the
>> available current review bandwidth go further and reduce the time to
>> actual resolution without anyone having to do more work.  The low
>> hanging fruit in this might be the obvious patches: obvious accept
>> (simple bug fix) or obvious reject (bogus code) should go through after
>> one review: do they?  Perhaps one other is wasting core time: after a
>> couple of failed reviews by people who could +2 the thing, is it time to
>> reject?  And perhaps, finally, there should be a maximum number of
>> reviews before an automatic reject?
>> Sorry for derailing the argument, but I do still see review bandwidth as
>> a key issue behind all the problems.
> Not answering your questions, but on a slightly related note.
> There has also been alot of work in infrastructure to help us when
> dealing with API design decisions. For example, in the past, most
> internal objects were just undocumented dicts, so virt drivers often
> added / used random stuff without most people realizing we had just
> committed to a new ABI constraint. With the Nova versioned objects
> framework, we've gained alot of clarity over object design, which
> has taken some of the pressure off review of virt driver code - any
> ABI changes are centralized in nova/objects/*.py so we can easily
> identify them. We've done similar formalization of REST API versioning
> and trying to formalize other areas of the code. Ultimately this should
> have a benefit on our ability to delegate reviews in areas of the code,
> and finally enable a goal of mine which is to allow virt driver teams
> independance in dealing reviewing & merging with code in their drivers.
> This should also let us be more pragmatic in deciding which features
> will need detailed review of specs upfront, vs which code be be done
> via a more light weight process with little/no specs


Also, this restricting work is causing may "conflicts" and
dependencies in the features we are working on.

We now have RPC versions, almost there with versioning the data over
RPC. The REST API is now getting to a place where we have a more
efficient method to add new features (not creating a whole new
extension for every extra attribute). We are making massive steps
forward here, although right now, its quite tough going.


More information about the OpenStack-dev mailing list