[openstack-dev] [nova] Averting the Nova crisis by splitting out virt drivers

Sean Dague sean at dague.net
Fri Sep 5 11:12:37 UTC 2014


On 09/05/2014 06:40 AM, Nikola Đipanov wrote:
> On 09/04/2014 12:24 PM, Daniel P. Berrange wrote:
>> Position statement
>> ==================
>>
>> Over the past year I've increasingly come to the conclusion that
>> Nova is heading for (or probably already at) a major crisis. If
>> steps are not taken to avert this, the project is likely to loose
>> a non-trivial amount of talent, both regular code contributors and
>> core team members. That includes myself. This is not good for
>> Nova's long term health and so should be of concern to anyone
>> involved in Nova and OpenStack.
>>
>> For those who don't want to read the whole mail, the executive
>> summary is that the nova-core team is an unfixable bottleneck
>> in our development process with our current project structure.
>> The only way I see to remove the bottleneck is to split the virt
>> drivers out of tree and let them all have their own core teams
>> in their area of code, leaving current nova core to focus on
>> all the common code outside the virt driver impls. I, now, none
>> the less urge people to read the whole mail.
>>
>>
>> Background information
>> ======================
>>
>> I see many factors coming together to form the crisis
>>
>>  - Burn out of core team members from over work 
>>  - Difficulty bringing new talent into the core team
>>  - Long delay in getting code reviewed & merged
>>  - Marginalization of code areas which aren't popular
>>  - Increasing size of nova code through new drivers
>>  - Exclusion of developers without corporate backing
>>
>> Each item on their own may not seem too bad, but combined they
>> add up to a big problem.
>>
> 
> As many others - I cannot +1 this enough. Some technical comments below
> that we may want to consider before, but to sum them up - this will be a
> TON OF WORK! we better make sure we really want to do this before.
> 
> However - please don't read this as FUD, maybe rather pointing out that
> devil is in the details, and maybe getting ahead of myself with too deep
> of a dive.
> 
>>
>>  - A fairly significant amount of nova code would need to be
>>    considered semi-stable API. Certainly everything under nova/virt
>>    and any object which is passed in/out of the virt driver API.
>>    Changes to such APIs would have to be done in a backwards
>>    compatible manner, since it is no longer possible to lock-step
>>    change all the virt driver impls. In some ways I think this would
>>    be a good thing as it will encourage people to put more thought
>>    into the long term maintainability of nova internal code instead
>>    of relying on being able to rip it apart later, at will.
>>
> 
> I think we should not underestimate how big of a job this will be. We
> have been treating that API as internal for a long time and a lot of
> abstractions are just broken and need to be redesigned and then
> refactored. A lot of the stuff is implementation specific (live
> migrations is a good example of this). What makes it more difficult is
> that we need to get this as right as possible before we do the split.
> 
> Now I am not saying this cannot be done or that we shouldn't to it,
> however I _am_ saying that we should not take lightly how much work
> there will be and how fiddly the work itself is.
> 
> On top of that - there are some other serious issues with nova common
> code that we need to take care of ASAP, and this will definitely
> increase the churn and make that more difficult. We should take this
> into account and make sure we are focusing efforts on the right things.
> Making sure we do is the biggest challenge nova core faces in addition
> to all the others mentioned above.
> 
>>  - The nova/virt/driver.py class would need to be much better
>>    specified. All parameters / return values which are opaque dicts
>>    must be replaced with objects + attributes. Completion of the
>>    objectification work is mandatory, so there is cleaner separation
>>    between virt driver impls & the rest of Nova.
>>
> 
> Not only that - currently nova-objects do their versioning magic only
> over RPC, while they would have to do it over library boundaries. This
> in itself will require work, and is likely going to influence how we
> stabilize the API.
> 
> However - splitting out the scheduler is likely to require objects to be
> able to do similar things, and there are other things that we may want
> to do (e.g. using properly versioned data for the extensible resources)
> that will benefit from this.
> 
>>  - If changes are required to common code, the virt driver developer
>>    would first have to get the necccessary pieces merged into Nova
>>    common. Then the follow up virt driver specific changes could be
>>    proposed to their repo. This implies that some changes to virt
>>    drivers will still contend for resource in the common nova repo 
>>    and team. This contention should be lower than it is today though
>>    since the current nova core team should have less code to look 
>>    after per-person on aggregate.
>>
> 
> A handy example of this I can think of is the currently granted FFE for
> serial consoles - consider how much of the code went into the common
> part vs. the libvirt specific part, I would say the ratio is very close
> to 1 if not even in favour of the common part (current 4 outstanding
> patches are all for core, and out of the 5 merged - only one of them was
> purely libvirt specific, assuming virt/ will live in nova-common).
> 
> Joe asked a similar question elsewhere on the thread.
> 
> Once again - I am not against doing it - what I am saying is that we
> need to look into this closer as it may not be as big of a win from the
> number of changes needed per feature as we may think.
> 
> Just some things to think about with regards to the whole idea, by no
> means exhaustive.

So maybe the better question is: what are the top sources of technical
debt in Nova that we need to address? And if we did, everyone would be
more sane, and feel less burnt.

Maybe the drivers are the worst debt, and jettisoning them makes them
someone else's problem, so that helps some. I'm not entirely convinced
right now.

I think Cells represents a lot of debt right now. It doesn't fully work
with the rest of Nova, and produces a ton of extra code paths special
cased for the cells path.

The Scheduler has a ton of debt as has been pointed out by the efforts
in and around Gannt. The focus has been on the split, but realistically
I'm with Jay is that we should focus on the debt, and exposing a REST
interface in Nova.

What about the Nova objects transition? That continues to be slow
because it's basically Dan (with a few other helpers from time to time).
Would it be helpful if we did an all hands on deck transition of the
rest of Nova for K1 and just get it done? Would be nice to have the bulk
of Nova core working on one thing like this and actually be in shared
context with everyone else for a while.

	-Sean

-- 
Sean Dague
http://dague.net



More information about the OpenStack-dev mailing list