[openstack-dev] [nova] An analysis of code review in Nova
John Garbutt
john at johngarbutt.com
Thu Mar 13 11:53:52 UTC 2014
On 13 March 2014 10:09, Matthew Booth <mbooth at redhat.com> wrote:
> On 12/03/14 18:28, Matt Riedemann wrote:
>> On 2/25/2014 6:36 AM, Matthew Booth wrote:
>>> I'm new to Nova. After some frustration with the review process,
>>> specifically in the VMware driver, I decided to try to visualise how the
>>> review process is working across Nova. To that end, I've created 2
>>> graphs, both attached to this mail.
>>>
>>> Both graphs show a nova directory tree pruned at the point that a
>>> directory contains less than 2% of total LOCs. Additionally, /tests and
>>> /locale are pruned as they make the resulting graph much busier without
>>> adding a great deal of useful information. The data for both graphs was
>>> generated from the most recent 1000 changes in gerrit on Monday 24th Feb
>>> 2014. This includes all pending changes, just over 500, and just under
>>> 500 recently merged changes.
>>>
>>> pending.svg shows the percentage of LOCs which have an outstanding
>>> change against them. This is one measure of how hard it is to write new
>>> code in Nova.
>>>
>>> merged.svg shows the average length of time between the
>>> ultimately-accepted version of a change being pushed and being approved.
>>>
>>> Note that there are inaccuracies in these graphs, but they should be
>>> mostly good. Details of generation here:
>>> https://github.com/mdbooth/heatmap. This code is obviously
>>> single-purpose, but is free for re-use if anyone feels so inclined.
>>>
>>> The first graph above (pending.svg) is the one I was most interested in,
>>> and shows exactly what I expected it to. Note the size of 'vmwareapi'.
>>> If you check out Nova master, 24% of the vmwareapi driver has an
>>> outstanding change against it. It is practically impossible to write new
>>> code in vmwareapi without stomping on an oustanding patch. Compare that
>>> to the libvirt driver at a much healthier 3%.
>>>
>>> The second graph (merged.svg) is an attempt to look at why that is.
>>> Again comparing the VMware driver with the libvirt we can see that at 12
>>> days, it takes much longer for a change to be approved in the VMware
>>> driver than in the libvirt driver. I suspect that this isn't the whole
>>> story, which is likely a combination of a much longer review time with
>>> very active development.
>>>
>>> What's the impact of this? As I said above, it obviously makes it very
>>> hard to come in as a new developer of the VMware driver when almost a
>>> quarter of it has been rewritten, but you can't see it. I am very new to
>>> this and others should validate my conclusions, but I also believe this
>>> is having a detrimental impact to code quality. Remember that the above
>>> 12 day approval is only the time for the final version to be approved.
>>> If a change goes through multiple versions, each of those also has an
>>> increased review period, meaning that the time from first submission to
>>> final inclusion is typically very, very protracted. The VMware driver
>>> has its fair share of high priority issues and functionality gaps, and
>>> the developers are motived to get it in the best possible shape as
>>> quickly as possible. However, it is my impression that when problems
>>> stem from structural issues, the developers choose to add metaphorical
>>> gaffer tape rather than fix them, because fixing both creates a
>>> dependency chain which pushes the user-visible fix months into the
>>> future. In this respect the review process is dysfunctional, and is
>>> actively detrimental to code quality.
>>>
>>> Unfortunately I'm not yet sufficiently familiar with the project to
>>> offer a solution. A core reviewer who regularly looks at it is an
>>> obvious fix. A less obvious fix might involve a process which allows
>>> developers to work on a fork which is periodically merged, rather like
>>> the kernel.
>>>
>>> Matt
>>>
>>>
>>>
>>> _______________________________________________
>>> OpenStack-dev mailing list
>>> OpenStack-dev at lists.openstack.org
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>
>>
>> When I originally read this I had some ideas in mind for a response
>> regarding review latency with the vmware driver patches, but felt like
>> anything I said, albeit what I consider honest, would sound
>> bad/offensive in some way, and didn't want to convey that.
>>
>> But this came up in IRC today:
>>
>> https://blueprints.launchpad.net/nova/+spec/vmware-spawn-refactor
>>
>> That spurred some discussion around this same topic and I think
>> highlights one of the major issues, which is code quality and the design
>> of the driver.
>>
>> For example, the driver's spawn method is huge and there are a lot of
>> nested methods within it. There are a lot of vmware patches and a lot
>> of blueprints, and a lot of them touch spawn. When I'm reviewing them,
>> I'm looking for new conditions and checking to see if those are unit
>> tested (positive/error cases) and a lot of the time I find it very
>> difficult to tell if they are or not. I think a lot of that has to do
>> with how the vmware tests are scaffolded with fakes to simulate a
>> vcenter backend, but it makes it very difficult if you're not extremely
>> familiar with that code to know if something is covered or not.
>>
>> And I've actually asked in bp reviews before, 'you have this new/changed
>> condition, where is it tested' and the response is more or less 'we plan
>> on refactoring this later and will add unit tests then' or 'it's hard to
>> test this path without a lot of changes to how the tests are working'.
>> That's unacceptable to me, and I generally give up on the review after
>> that.
>
> I think part of this is a (hopefully temporary) catch 22. Major
> refactors are put off because review process latency would complicate
> and delay user visible features, and user visible features are delayed
> in review because they highlight the urgent need for major refactors. I
> think Juno will be an excellent opportunity to jump out of this cycle.
+1
Just to answer this point, despite the review latency, please don't be
tempted to think one big change will get in quicker than a series of
little, easy to review, changes. All changes are not equal. A large
change often scares me away to easier to review patches.
Seems like, for Juno-1, it would be worth cancelling all non-urgent
bug fixes, and doing the refactoring we need.
I think the aim here should be better (and easier to understand) unit
test coverage. Thats a great way to drive good code structure.
>> So to move this all forward, I think that bp above should be top
>> priority for the vmware team in Juno to keep bp patches moving at the
>> pace they do, because the features and refactoring just keeps coming and
>> at least for me it's very hard to burn out on looking at those reviews.
>
> As I think has been mentioned elsewhere, the other big one is the move
> to the common vSphere API code in oslo, which is a huge source of cruft
> in the driver. This is going to require mechanical changes across the
> whole driver, so is going to be quite disruptive. As this work is
> largely done, I would put it before even the spawn() refactor.
+1 sounds like a good first step is to move to oslo.vmware
A big step forward towards the end of Juno is the driver specific CI.
That really does help give reviewers more confidence that the changes
doesn't break everything.
John
More information about the OpenStack-dev
mailing list