[openstack-dev] [Nova] What's holding nova development back?

Sean Dague sean at dague.net
Mon Sep 15 13:42:00 UTC 2014


On 09/15/2014 05:42 AM, Daniel P. Berrange wrote:
> On Sun, Sep 14, 2014 at 07:07:13AM +1000, Michael Still wrote:
>> Just an observation from the last week or so...
>>
>> The biggest problem nova faces at the moment isn't code review latency. Our
>> biggest problem is failing to fix our bugs so that the gate is reliable.
>> The number of rechecks we've done in the last week to try and land code is
>> truly startling.
> 
> I consider both problems to be pretty much equally as important. I don't
> think solving review latency or test reliabilty in isolation is enough to
> save Nova. We need to tackle both problems as a priority. I tried to avoid
> getting into my concerns about testing in my mail on review team bottlenecks
> since I think we should address the problems independantly / in parallel.
> 
>> I know that some people are focused by their employers on feature work, but
>> those features aren't going to land in a world in which we have to hand
>> walk everything through the gate.
> 
> Unfortunately the reliability of the gate systems has the highest negative
> impact on productivity right at the point in the dev cycle where we need
> it to have the least impact too.
> 
> If we're going to continue to raise the bar in terms of testing coverage
> then we need to have a serious look at the overall approach we use for
> testing because what we do today isn't going to scale, even if it is
> 100% reliable. We can't keep adding new CI jobs for each new nova.conf
> setting that introduces a new code path, because each job has major
> implications for resource consumption (number of test nodes, log storage),
> not to mention reliability. I think we need to figure out a way to get
> more targetted testing of features, so we can keep the overall number
> of jobs lower and the tests shorter.
> 
> Instead of having a single tempest run that exercises all the Nova
> functionality in one run, we need to figure out how to split it up
> into independant functional areas. For example if we could isolate
> tests which are affected by choice of cinder storage backend, then
> we could run those subset of tests multiple times, once for each
> supported cinder backend. Without this, the combinatorial explosion
> of test jobs is going to kill us.

One of the top issues killing Nova patches last week was a unit test
race (the wsgi worker one). There is no one to blame but Nova for that.
Jay was really the only team member digging into it.

I don't disagree on the disaggregation problem, however as lots of Nova
devs are ignoring unit test fails at this point, unless that changes no
other disaggregation is going make anything better.

	-Sean

-- 
Sean Dague
http://dague.net



More information about the OpenStack-dev mailing list