[openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

Dan Prince dprince at redhat.com
Thu Sep 11 18:02:00 UTC 2014


On Thu, 2014-09-04 at 11:24 +0100, Daniel P. Berrange wrote:
> Position statement
> ==================
> 
> Over the past year I've increasingly come to the conclusion that
> Nova is heading for (or probably already at) a major crisis. If
> steps are not taken to avert this, the project is likely to loose
> a non-trivial amount of talent, both regular code contributors and
> core team members. That includes myself. This is not good for
> Nova's long term health and so should be of concern to anyone
> involved in Nova and OpenStack.
> 
> For those who don't want to read the whole mail, the executive
> summary is that the nova-core team is an unfixable bottleneck
> in our development process with our current project structure.
> The only way I see to remove the bottleneck is to split the virt
> drivers out of tree and let them all have their own core teams
> in their area of code, leaving current nova core to focus on
> all the common code outside the virt driver impls. I, now, none
> the less urge people to read the whole mail.
> 


I've always referred to the virt/driver.py API as an internal API
meaning there are no guarantees about it being preserved across
releases. I'm not saying this is correct... just that it is what we've
got.  While OpenStack attempts to do a good job at stabilizing its
public API's we haven't done the same for internal API's. It is actually
quite painful to be out of tree at this point as I've seen with the
Ironic driver being out of the Nova tree. (really glad that is back in
now!)

So because we haven't designed things to be split out in this regard we
can't just go and do it. 

I tinkered with some numbers... not sure if this helps or hurts my
stance but here goes. By my calculation this is the number of commits
we've made that touched each virt driver tree for the last 3 releases
plus stuff done to-date in Juno.

Created using a command like this in each virt directory for each
release: git log origin/stable/havana..origin/stable/icehouse
--no-merges --pretty=oneline . | wc -l

essex => folsom:

 baremetal: 26
 hyperv: 9
 libvirt: 222
 vmwareapi: 18
 xenapi: 164
* total for above: 439

folsom => grizzly:

 baremetal: 83
 hyperv: 58
 libvirt: 254
 vmwareapi: 59
 xenapi: 126
   * total for above: 580

grizzly => havana:

 baremetal: 48
 hyperv: 55
 libvirt: 157
 vmwareapi: 105
 xenapi: 123
   * total for above: 488

havana => icehouse:

 baremetal: 45
 hyperv: 42
 libvirt: 212
 vmwareapi: 121
 xenapi: 100
   * total for above: 520

icehouse => master:

 baremetal: 26
 hyperv: 32
 libvirt: 188
 vmwareapi: 121
 xenapi: 71
   * total for above: 438

-------

A couple of things jump out at me from the numbers:

 -drivers that are being deprecated (baremetal) still have lots of
changes. Some of these changes are valid bug fixes for the driver but a
majority of them are actually related to internal cleanups and interface
changes. This goes towards the fact that Nova isn't mature enough to do
a split like this yet.

 -the number of commits landed isn't growing *that* much across releases
in the virt driver trees. Presumably we think we were doing a better job
2 years ago? But the number of changes in the virt trees is largely the
same... perhaps this is because people aren't submitting stuff because
they are frustrated though?

-------

For comparison here are the total number of commits for each Nova
release (includes the above commits):

essex -> folsom: 1708
folsom -> grizzly: 2131
grizzly -> havana: 2188
havana -> icehouse: 1696
icehouse -> master: 1493

-------

So say around 30% of the commits for a given release touch the virt
drivers themselves.. many of them aren't specifically related to the
virt drivers. Rather just general Nova internal cleanups because the
interfaces aren't stable.

And while splitting Nova virt drivers might help out some I'm not sure
it helps the general Nova issue in that we have more reviews with less
of the good ones landing. Nova is a weird beast at the moment and just
splitting things like this is probably going to harm as much as it helps
(like we saw with Ironic) unless we stabilize the APIs... and even then
I'm skeptical of death by a million tiny sub-projects. I'm just not
convinced this is the number #1 pain point around Nova reviews. What
about the other 70%?

For me a lot of the frustration with reviews is around test/gate time,
pushing things through, rechecks, etc... and if we break something it
takes just as much time to get the revert in. The last point (the
ability to revert code quickly) is a really important one as it
sometimes takes days to get a simple (obvious) revert landed. This
leaves groups like TripleO who have their own CI and 3rd party testing
systems which also capable of finding many critical issues in the
difficult position of having to revert/cherry pick critical changes for
days at a time in order to keep things running.

Maybe I'm impatient (I totally am!) but I see much of the review
slowdown as a result of the feedback loop times increasing over the
years. OpenStack has some really great CI and testing but I think our
focus on not breaking things actually has us painted into a corner. We
are losing our agility and the review process is paying the price. At
this point I think splitting out the virt drivers would be more of a
distraction than a help.

Dan







More information about the OpenStack-dev mailing list