[openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

Daniel P. Berrange berrange at redhat.com
Fri Sep 12 09:37:26 UTC 2014


On Thu, Sep 11, 2014 at 02:02:00PM -0400, Dan Prince wrote:
> I've always referred to the virt/driver.py API as an internal API
> meaning there are no guarantees about it being preserved across
> releases. I'm not saying this is correct... just that it is what we've
> got.  While OpenStack attempts to do a good job at stabilizing its
> public API's we haven't done the same for internal API's. It is actually
> quite painful to be out of tree at this point as I've seen with the
> Ironic driver being out of the Nova tree. (really glad that is back in
> now!)

Oh absolutely, I've always insisted that virt/driver.py is unstable
and that as a result out of tree drivers get to keep both pieces when
it breaks.

> So because we haven't designed things to be split out in this regard we
> can't just go and do it. 

I don't think that conclusion follows directly. We certainly need to
do some prep work to firm up our virt driver interface, as outlined
in my original mail, but if we agreed to push forward in this I think
it is practical to get that done in Kilo and split in Lxxxx. It is
mostly a matter of having the will todo it IMHO.

> I tinkered with some numbers... not sure if this helps or hurts my
> stance but here goes. By my calculation this is the number of commits
> we've made that touched each virt driver tree for the last 3 releases
> plus stuff done to-date in Juno.
> 
> Created using a command like this in each virt directory for each
> release: git log origin/stable/havana..origin/stable/icehouse
> --no-merges --pretty=oneline . | wc -l
> 
> essex => folsom:
> 
>  baremetal: 26
>  hyperv: 9
>  libvirt: 222
>  vmwareapi: 18
>  xenapi: 164
> * total for above: 439
> 
> folsom => grizzly:
> 
>  baremetal: 83
>  hyperv: 58
>  libvirt: 254
>  vmwareapi: 59
>  xenapi: 126
>    * total for above: 580
> 
> grizzly => havana:
> 
>  baremetal: 48
>  hyperv: 55
>  libvirt: 157
>  vmwareapi: 105
>  xenapi: 123
>    * total for above: 488
> 
> havana => icehouse:
> 
>  baremetal: 45
>  hyperv: 42
>  libvirt: 212
>  vmwareapi: 121
>  xenapi: 100
>    * total for above: 520
> 
> icehouse => master:
> 
>  baremetal: 26
>  hyperv: 32
>  libvirt: 188
>  vmwareapi: 121
>  xenapi: 71
>    * total for above: 438
> 
> -------
> 
> A couple of things jump out at me from the numbers:
> 
>  -drivers that are being deprecated (baremetal) still have lots of
> changes. Some of these changes are valid bug fixes for the driver but a
> majority of them are actually related to internal cleanups and interface
> changes. This goes towards the fact that Nova isn't mature enough to do
> a split like this yet.

Our position that the virt driver is internal only, has permitted us
to make backwards incompatible changes to it at will. Given that freedom
people inevitably take that route since is is the least effort option.
If our position had been that the virt driver needed to be forwards
compatible, people would have been forced to make the same changes without
breaking existing drivers.  IOW, the fact that we've made lots of changes
to baremetal historically, doesn't imply that we can't decide to make the
virt driver API stable henceforth & thus avoid further changes of that
kind.

>  -the number of commits landed isn't growing *that* much across releases
> in the virt driver trees. Presumably we think we were doing a better job
> 2 years ago? But the number of changes in the virt trees is largely the
> same... perhaps this is because people aren't submitting stuff because
> they are frustrated though?

Our core team size & thus review bandwidth has been fairly static over
that time, so the only way virt driver commits could have risen is if
core reviewers increased their focus on virt drivers at the expense of
other parts of nova. I actually read those numbers as showing that as
we've put more effort into reviewing vmware contributions, we've lost
resource going into libvirt contributions.

In addition we're of course missing out on capturing the changes that
we've never had submitted, or submitted by abandoned, or submitted by
slipped across multiple releases waiting for merge. Overall I think
the figures paint a pretty depressing picture of no overall growth,
perhaps even a decline.


> 
> For comparison here are the total number of commits for each Nova
> release (includes the above commits):
> 
> essex -> folsom: 1708
> folsom -> grizzly: 2131
> grizzly -> havana: 2188
> havana -> icehouse: 1696
> icehouse -> master: 1493
> 
> -------

So we've still a way to go for juno cycle, but I'd be surprised if we
got beyond the havana numbers given where we are today. Again I think
those numbers show a plateau or even decline, which just reinforces
my point that our model is not scaling today.

> So say around 30% of the commits for a given release touch the virt
> drivers themselves.. many of them aren't specifically related to the
> virt drivers. Rather just general Nova internal cleanups because the
> interfaces aren't stable.
> 
> And while splitting Nova virt drivers might help out some I'm not sure
> it helps the general Nova issue in that we have more reviews with less
> of the good ones landing. Nova is a weird beast at the moment and just
> splitting things like this is probably going to harm as much as it helps
> (like we saw with Ironic) unless we stabilize the APIs... and even then

Stabalizing the API is an absolute pre-requisite I mentioned in my
original mail. I'm not suggesting a split with an unstable API.

> I'm skeptical of death by a million tiny sub-projects. I'm just not
> convinced this is the number #1 pain point around Nova reviews. What
> about the other 70%?

I'm not claiming this will solve all our problems, but I believe it
will free up some more time for people to focus on the other 70%.
Most importantly it will allow the people working on the different
parts to avoid conflicting with each other for resources to such a
large extent and give those subsystem teams control over their own
destiny as subject matter experts. eg a feature approved by libvirt
team, won't mean that vmware team have less opportunity to accept
their own features, which is a serious problem today.

> For me a lot of the frustration with reviews is around test/gate time,
> pushing things through, rechecks, etc... and if we break something it
> takes just as much time to get the revert in. The last point (the
> ability to revert code quickly) is a really important one as it
> sometimes takes days to get a simple (obvious) revert landed. This
> leaves groups like TripleO who have their own CI and 3rd party testing
> systems which also capable of finding many critical issues in the
> difficult position of having to revert/cherry pick critical changes for
> days at a time in order to keep things running.

Yes, the test/gate time is a major problem on its own which is
seriously harming the efficiency of our contributors. I think
that is something that needs major action to address, but I did
not want to bring it into this thread too much, since I think it
is a largely independant problem to be solved.

Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|



More information about the OpenStack-dev mailing list