[openstack-dev] [Nova] What's holding nova development back?

Daniel P. Berrange berrange at redhat.com
Mon Sep 15 09:42:32 UTC 2014


On Sun, Sep 14, 2014 at 07:07:13AM +1000, Michael Still wrote:
> Just an observation from the last week or so...
> 
> The biggest problem nova faces at the moment isn't code review latency. Our
> biggest problem is failing to fix our bugs so that the gate is reliable.
> The number of rechecks we've done in the last week to try and land code is
> truly startling.

I consider both problems to be pretty much equally as important. I don't
think solving review latency or test reliabilty in isolation is enough to
save Nova. We need to tackle both problems as a priority. I tried to avoid
getting into my concerns about testing in my mail on review team bottlenecks
since I think we should address the problems independantly / in parallel.

> I know that some people are focused by their employers on feature work, but
> those features aren't going to land in a world in which we have to hand
> walk everything through the gate.

Unfortunately the reliability of the gate systems has the highest negative
impact on productivity right at the point in the dev cycle where we need
it to have the least impact too.

If we're going to continue to raise the bar in terms of testing coverage
then we need to have a serious look at the overall approach we use for
testing because what we do today isn't going to scale, even if it is
100% reliable. We can't keep adding new CI jobs for each new nova.conf
setting that introduces a new code path, because each job has major
implications for resource consumption (number of test nodes, log storage),
not to mention reliability. I think we need to figure out a way to get
more targetted testing of features, so we can keep the overall number
of jobs lower and the tests shorter.

Instead of having a single tempest run that exercises all the Nova
functionality in one run, we need to figure out how to split it up
into independant functional areas. For example if we could isolate
tests which are affected by choice of cinder storage backend, then
we could run those subset of tests multiple times, once for each
supported cinder backend. Without this, the combinatorial explosion
of test jobs is going to kill us.


Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|



More information about the OpenStack-dev mailing list