[openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

Daniel P. Berrange berrange at redhat.com
Fri Sep 5 10:45:50 UTC 2014


On Thu, Sep 04, 2014 at 06:22:18PM -0500, Michael Still wrote:
> On Thu, Sep 4, 2014 at 5:24 AM, Daniel P. Berrange <berrange at redhat.com> wrote:
> 
> [Heavy snipping because of length]
> 
> > The radical (?) solution to the nova core team bottleneck is thus to
> > follow this lead and split the nova virt drivers out into separate
> > projects and delegate their maintainence to new dedicated teams.
> >
> >  - Nova becomes the home for the public APIs, RPC system, database
> >    persistent and the glue that ties all this together with the
> >    virt driver API.
> >
> >  - Each virt driver project gets its own core team and is responsible
> >    for dealing with review, merge & release of their codebase.
> 
> I think this is the crux of the matter. We're not doing a great job of
> landing code at the moment, because we can't keep up with the review
> workload.
> 
> So far we've had two proposals mooted:
> 
>  - slots / runways, where we try to rate limit the number of things
> we're trying to review at once to maintain focus

FWIW, I'm not really seeing that as a long term solution. In its
essence it is just a more effective way for us to say 'no' to our
potential contributors. While it could no doubt relieve pressure
on the core team by reducing the flow of the pipe, I don't think
it is helpful for our contributors overall.

>  - splitting all the virt drivers out of the nova tree
> 
> Splitting the drivers out of the nova tree does come at a cost -- we'd
> need to stabilise and probably version the hypervisor driver
> interface, and that will encourage more "out of tree" drivers, which
> are things we haven't historically wanted to do. If we did this split,
> I think we need to acknowledge that we are changing policy there. It
> also means that nova-core wouldn't be the ones holding the quality bar
> for hypervisor drivers any more, I guess this would open the door for
> drivers to more actively compete on the quality of their
> implementations, which might be a good thing.

There are already a number of drivers out of tree such as Docker,
Ironic (though soon to be in tree), and IIUC there's something IBM
have done for Power hypervisor, and work Oracle have done for the
Solaris virt/container technologies. Probably the distinction I'd
made is around things that are actively part of the OpenStack
community (eg on our gerrit infrastructure and or stackforge, etc),
vs things that are developed in complete isolation from the OpenStack
community.

I'm unclear what the state of play is wrt discussions on OpenStack
technology compatibility certification & trademark usage, but perhaps
that is a partial counterweight to your concern ? I'd certainly like
to see a focus on out of tree drivers remaining a strong part of the
openstack community, and not go off into their own completely isolated
world outside the community.

But yes, I am clearly proposing a change our integration policy here
and so we need need to carefully consider what that means and take
any neccessary steps to mitigate risks.

In some respects I think the split repos could allow us to raise the
bar in terms of quality. For example, with a single repo, I don't
see it ever being practical to make VMware/HyperV/XenAPI  CI systems
gating on changes, because it would push up the level of pain from
false job failures in the gate even further than today. With a separate
repo each virt driver would only need to run jobs directly related to
them, so the VMWare CI could easily be made gating on VMWare driver git
repo.

On testing in general, I think we need to look at the granularity
at which we run tests, in order to let us scale up the number of tests
we run. For example, it is suggested that each feature like disk 
encryption,  disk discard support, each vif driver, and so on, each
requires a new tempest job with appropriate settings. If we look at
the number of possible tunable knobs like, that easily implies 100's
more tempest jobs with varying configs. I don't think it is practical
to consider doing that with our setup today. With separate virt driver
repos we'd have more headroom to add a larger number of jobs since
the volume of changes being tested overall would be smaller.

> Both of these have interesting aspects, and I agree we need to do
> _something_. I do wonder if there is a hybrid approach as well though.
> For example, could we implement some sort of more formal lieutenant
> system for drivers? We've talked about it in the past but never been
> able to express how it would work in practise.

Gerrit makes it hard to express that formally due to the lack of
path based permissioning. If we do go for the virt driver split,
it would none the less be useful if we trialled a lieutenant or
sub-team model during Kilo, as a way to prepare for an eventual
driver split in Lxxxx. So this is worth talking about regardless
I reckon.

I still think on balance a virt driver split is benefical since
it brings benefits beyond just the review team.

> The last few days have been interesting as I watch FFEs come through.
> People post explaining their feature, its importance, and the risk
> associated with it. Three cores sign on for review. All of the ones
> I've looked at have received active review since being posted. Would
> it be bonkers to declare nova to be in "permanent feature freeze"? If
> we could maintain the level of focus we see now, then we'd be getting
> heaps more done that before.

For the vast majority of the FFEs I've signed up for at least, there
has been pretty much no further review work required (at least on my
part). It is all stuff that I'd already ACKd and mostly just lost the
race to merge in time. So I'd not view these past few days of activity
as representative of how we're able to work in general.

> These issues should very definitely be on the agenda for the design
> summit, probably early in the week.

Absolutely. I added this topic to the etherpad for summit discussion
ideas you linked to last week.

Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|



More information about the OpenStack-dev mailing list