[openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

Daniel P. Berrange berrange at redhat.com
Fri Sep 5 11:26:49 UTC 2014


On Fri, Sep 05, 2014 at 07:00:44AM -0400, Sean Dague wrote:
> On 09/05/2014 06:22 AM, Daniel P. Berrange wrote:
> > On Fri, Sep 05, 2014 at 07:31:50PM +0930, Christopher Yeoh wrote:
> >> On Thu, 4 Sep 2014 11:24:29 +0100
> >> "Daniel P. Berrange" <berrange at redhat.com> wrote:
> >>>
> >>>  - A fairly significant amount of nova code would need to be
> >>>    considered semi-stable API. Certainly everything under nova/virt
> >>>    and any object which is passed in/out of the virt driver API.
> >>>    Changes to such APIs would have to be done in a backwards
> >>>    compatible manner, since it is no longer possible to lock-step
> >>>    change all the virt driver impls. In some ways I think this would
> >>>    be a good thing as it will encourage people to put more thought
> >>>    into the long term maintainability of nova internal code instead
> >>>    of relying on being able to rip it apart later, at will.
> >>>
> >>>  - The nova/virt/driver.py class would need to be much better
> >>>    specified. All parameters / return values which are opaque dicts
> >>>    must be replaced with objects + attributes. Completion of the
> >>>    objectification work is mandatory, so there is cleaner separation
> >>>    between virt driver impls & the rest of Nova.
> >>
> >> I think for this to work well with multiple repositories and drivers
> >> having different priorities over implementing changes in the API it
> >> would not just need to be semi-stable, but stable with versioning built
> >> in from the start to allow for backwards incompatible changes. And
> >> the interface would have to be very well documented including things
> >> such as what exceptions are allowed to be raised through the API.
> >> Hopefully this would be enforced through code as well. But as long as
> >> driver maintainers are willing to commit to this extra overhead I can
> >> see it working. 
> > 
> > With our primary REST or RPC APIs we're under quite strict rules about
> > what we can & can't change - almost impossible to remove an existing
> > API from the REST API for example. With the internal virt driver API
> > we would probably have a little more freedom. For example, I think
> > if we found an existing virt driver API that was insufficient for a
> > new bit of work, we could add a new API in parallel with it, give the
> > virt drivers 1 dev cycle to convert, and then permanently delete the
> > original virt driver API. So a combination of that kind of API
> > replacement,  versioning for some data structures/objects, and use of
> > the capabilties flags would probably be sufficient. That's what I mean
> > by semi-stable here - no need to maintain existing virt driver APIs
> > indefinitely - we can remove & replace them in reasonably short time
> > scales as long as we avoid any lock-step updates.
> 
> I have spent a lot of time over the last year working on things that
> require coordinated code lands between projects.... it's much more
> friction than you give it credit.
> 
> Every added git tree adds a non linear cost to mental overhead, and a
> non linear integration cost. Realistically the reason the gate is in the
> state it is has a ton to do with the fact that it's integrating 40 git
> trees. Because virt drivers run in the process space of Nova Compute,
> they can pretty much do whatever, and the impacts are going to be
> somewhat hard to figure out.
> 
> Also, if spinning these out seems like the right idea, I think nova-core
> needs to retain core rights over the drivers as well. Because there do
> need to be veto authority on some of the worst craziness.

If they want todo crazy stuff, let them live or die with the
consequences.

> If the VMWare team stopped trying to build a distributed lock manager
> inside their compute driver, or the Hyperv team didn't wait until J2 to
> start pushing patches, I think there would be more trust in some of
> these teams. But, I am seriously concerned in both those cases, and the
> slow review there is a function of a historic lack of trust in judgment.
> I also personally went on a moratorium a year ago in reviewing either
> driver because entities at both places where complaining to my
> management chain through back channels that I was -1ing their code...

I venture to suggest that the reason we care so much about those kind
of things is precisely because of our policy of pulling them in the
tree. Having them in tree means their quality (or not) reflects directly
on the project as a whole. Separate them from Nova as a whole and give
them control of their own desinty and they can deal with the consequences
of their actions and people can judge the results for themselves.

We don't have the time or resources go continue baby-sitting them
ourselves - attempting todo so has just resulted in a scenario where
they end up getting largely ignored as you admit here. This ultimately
makes their quality even worse, because the lack of reviewer availability
means they stand little chance of pushing through the work to fix what
problems they have. We've seen this first hand with the major refactoring
that vmware driver team has been trying todo. Our current setup where we
retain veto and try control what other people do as directly resulted in
the vmware driver suffering poor quality for even longer time. If vmware
had been out of tree the major refactoring they've been trying to merge
would have been done 6 months ago, to everyone's benefit. The same is
true for the libvirt driver - there's plenty of work I'd like todo to
improve it, but cannot even contemplate because there's little to no
chance of ever getting it past our fundamental core reviewer bottleneck.

Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|



More information about the OpenStack-dev mailing list