[openstack-dev] [nova][vmware] retrospective on IceHouse and a call to action for Juno

Shawn Hartsock hartsock at acm.org
Wed Mar 26 23:07:55 UTC 2014


Next week during the VMwareAPI subteam meeting I would like to discuss
blueprint priority order and tentative scheduling for Juno. I have a
proposal for the order that I would like to conduct a formal vote on
and I hope that we as a community can abide by the vote's results.

In short, we currently have a number of blueprints in flight that were
icehouse near-misses and new features are already going to be starved
fro reviewer attention. Adding *more* features is likely to make the
problem worse.

I am advocating for refactorings-first and features later.

If you've not read:
http://lists.openstack.org/pipermail/openstack-dev/2014-February/028077.html

Please do. It's good background and dove-tails with this topic.

There is a tl;dr at the end.

== Summary ==

I used to send out weekly blueprint, bug, and review tracking emails
focused on VMware related changes. I've stopped doing that. The reason
I have is that I have not seen a return on the investment of making
those updates to the community. In this public retrospective on
IceHouse, I hope that I will shed light on which practices were
working and which were not.

== A description of the problem ==

We can't get features merged upstream. Many people are expending
effort and this effort is not being rewarded and the driver's
evolution is suffering for it.

I have been observing the VMware drivers' development since Havana
opened for accepting submissions back in early 2013 and I think we
have a pattern that we as a community need to address. By community, I
mean those of us committing to the vmwareapi drivers in Nova.

I recall working with developers in the broader community (not VMware
employees) to get new features into the Nova driver for vCenter. And I
recall intimately that we just missed merging in Havana-1. In fact, of
the blueprints I had been tracking back then, no blueprints merged and
they were all slid to Havana-2. We worked very hard and most
blueprints missed Havana-3 with only a handful of exceptions.

During Havana I refrained from large change suggestions because I was
new to the community and any such change risked "blowing up" other
developers work. Big changes can be very disruptive even if they are
for good causes. So no major refactoring work occurred.

In IceHouse I started tracking things much more thoroughly. This was
the first time we had a significant number of developers to coordinate
and we had in the neighborhood of a dozen blueprints to suggest adding
features to the Nova driver for vCenter in IceHouse. A significant
number of these were ready (by our group standards) for IceHouse-2.
These all slipped to IceHouse-3 in the same manner all blueprints for
H2 had slipped. Finally, I3 followed the same pattern as H3 with only
a small set of features surviving the gauntlet.

In IceHouse, only two of the dozen blueprints we as a driver sub-team
had in flight managed to land. In the linked retrospective detail
paste I've managed to consolidate notes I made throughout IceHouse on
blueprint progress. Snapshots of these notes are publicly available on
the IRC logs for the VMwareAPI sub-team if anyone would like to verify
my summary of events.

IceHouse retrospective detail:
  http://paste.openstack.org/raw/74393/

VMwareAPI team meeting details:
https://wiki.openstack.org/wiki/Meetings/VMwareAPI#Next_Meeting

== Learning from Successes ==

Of the thousands of person hours spent by VMware staff and non-staff
working on the VMwareAPI drivers only a handful of feature patches
merged. Why is that?

I have listed all the feature patches that merged that I was able to
find quickly in the previous link on retrospective detail. One
particularly difficult merge was
https://review.openstack.org/#/c/56416/ standing at an astonishing 74
revisions and four months of concentrated effort to achieve a change
of 744 lines in a driver with a total line count on the order of
13,000 lines (including the tests.) This is an 5.6% change in the
driver's code base costing 4 months of effort and thousands of person
hours between multiple companies. Not to mention the developer's
personal sacrifice as they worked nights and weekends to make those
744 lines happen.

In that time we see that the code in review enters conflicts with
another high priority feature:
* https://review.openstack.org/#/c/56416/60/nova/virt/vmwareapi/vmops.py

Which causes both blueprints to be revised
* https://review.openstack.org/#/c/63084/23/nova/virt/vmwareapi/vmops.py

March 6th becomes a very busy and confusing day as the two attention
starved BP are wrestled into the code base. I'll leave parsing the
details to the reader as an exercise. The interaction between these
two patches is interesting enough to be worth closer examination.

== A common complaint ==

Common complaints about the Nova vmware driver that you will find
elsewhere on this mailing list include (paraphrased):

* I can't tell where something is tested or how
* The code is hard to follow so I hate reviewing that code-base
* I can't propose a change because so much is in conflict
* Who is working on what?

We can't really say any particular failed BP is a prime example. In
short all of these misses are 'misses' because they all starved for
core-reviewer attention. The lesson here is that you can expend great
amounts of effort and this does not mean you will see successful
merges.

== A call to action ==

Considering the pattern (for the vmware driver):
H1 - 0 new features
H2 - 0 new features
I1 - 0 new features
I2 - 0 new features

If J1 were also see 0 new features to the VMware driver this would not
be out of the ordinary. In fact if J1 were to merge *anything* not bug
related for the vmware drivers that would be a *significant*
improvement of the state of affairs for this driver. So any proposal
that I might make should it jeopardize J1's blueprint deadlines ...
would really not be anything drastic if we account for history.

== step 1 address complaints about testing and "following" the code ==

Myself and a cadre of developers are currently executing on:
  https://blueprints.launchpad.net/nova/+spec/vmware-spawn-refactor

This is a zero new feature blueprint. It is a coordinated refactoring
of some badly abused and malformed code. We expect to have the easiest
half this work completed by Friday and the more difficult portion to
follow in very short order. This comprises a 500+ line refactor that
with core-reviewer support we expect to be able to land successfully
*before* the Atlanta design summit.

I am drafting this blueprint as well:
  https://blueprints.launchpad.net/nova/+spec/vmware-vm-ref-refactor

... which identifies the root cause of 3 Critical or high priority
bugs that forced developers into late nights and long workdays as we
tried to close down IceHouse RC1. This is also a very difficult to
understand pain point in the code base. And as many have pointed out,
there are multiple implementations of how to accomplish the same thing
in this driver, many methods that say one thing and do another, and
many strange and hard to understand quirks. If we were to consolidate
these at minimum we would only have to fix bugs in one location.

There is also a merge effort with oslo.vmware which is the start of a
major refactoring work of all the vmware drivers across OpenStack.
Once again, it's an attempt to at least establish "how to do things"
in the driver.

== step 2 address the 'in flight' problem ==

To deal with the "I can't propose changes" problem I want the
VMwareAPI subteam to vote on priority order of blueprints &
refactorings. That means that if a blueprint conflicts with another
and it is voted lower priority as a group we accept that any parallel
work on that feature will have to be redone when it hits a conflict.
We will need to develop a dependency order and more or less agree to
work to that. That doesn't mean people sit idle. It means people don't
work night and day for something that won't see light.

As a general rule of thumb, a new feature should probably occur in a
new mini-module (an object or similar) if not its own module (that
will minimize merge conflicts) when its time to include the new
feature... that feature should be "wired in" to the main flow of the
driver. That means refactors and changes underneath new features
should have minimal impact on the feature developer.

(Hopefully these are not radical concepts to the majority of readers
but if they are I'm more than willing to discuss in detail the ideas
of Structured Programming and Object Oriented Programming as they
pertain to these types of issues.)

== Step 3 who is working on what? (and what priority) ==

As I've alluded to before in the VMwareAPI subteam we track who is
working on what in etherpad. These are part of the public record and
you can see the evolution of our pads through time as we coordinate
with other developers. I have not been writing these etherpads down to
the line-level in who is working on what method and what line. I
sincerely hope we don't need to coordinate at that level. We do have a
general idea of who is working in what module and what their goal is.

== In Conclusion ==

I am asserting that the reason the VMwareAPI sub-team has had a
difficult time upstreaming is that core-reviewers cannot be guided
through why the change submitted is of high quality. This is due to
the fact that the driver itself is hard to understand in ways that are
non-meaningful to the problem domain. This is called "accidental
complexity" and unless we tame this problem history will repeat
itself.

If we do not refactor the driver we can already expect 0 new features
in the vmware driver for Juno-1 so mandating 'refactors first' seems
drastic as it destroys feature progress... but that progress *is a
lie* anyway. Historically, we have seen 0 feature progress upstream in
Nova on each milestone no matter how stable or mature the feature may
be.

* If we are successful in this effort I may ask that we mark each
milestone-1 a refactoring milestone. Let's see how this goes first.

* comments and open discussion on oder/priority of refactor blueprints
happen starting next IRC meeting Wednesday next (see the wiki for
details) all refactors must have documents ready to go for corporate
discussion over the next few days. We will vote/decide collectively
the following Wednesday.

* I would advocate for all refactor efforts initial development cycles
to be completed no later than the Juno design summit. That is an
artificial deadline of May 13th. This should give 3 weeks for reviews
and merging. It also means if you don't make it... your refactor
should be moved to the K time-frame. That means if you want something
refactored it has to happen in April or you lose out.

While I'm making demands...

I also would like the 'K' release to end up named 'Kodiak' for
personal reasons. The Kodiak is a majestic animal and I spent a great
deal of my youth amongst them. I really think that's a fine name for a
majestic product. But, I digress.

== and if you don't like reading, a little video for you to watch ==

**tl;dr** pay back the technical debt first, then charge up the
development credit card

Technical Debt for those who've never heard the term before...
    https://www.youtube.com/watch?v=pqeJFYwnkjE

-- 
# Shawn.Hartsock - twitter: @hartsock - plus.google.com/+ShawnHartsock



More information about the OpenStack-dev mailing list