[openstack-dev] [nova][neutron][cinder] Averting the Nova crisis by splitting out virt drivers

Sean Dague sean at dague.net
Thu Sep 11 11:36:02 UTC 2014


On 09/11/2014 05:18 AM, Daniel P. Berrange wrote:
> On Thu, Sep 11, 2014 at 09:23:34AM +1000, Michael Still wrote:
>> On Thu, Sep 11, 2014 at 8:11 AM, Jay Pipes <jaypipes at gmail.com> wrote:
>>
>>> a) Sorting out the common code is already accounted for in Dan B's original
>>> proposal -- it's a prerequisite for the split.
>>
>> Its a big prerequisite though. I think we're talking about a release
>> worth of work to get that right. I don't object to us doing that work,
>> but I think we need to be honest about how long its going to take. It
>> will also make the core of nova less agile, as we'll find it hard to
>> change the hypervisor driver interface over time. Do we really think
>> its ready to be stable?
> 
> Yes, in my proposal I explicitly said we'd need to have Kilo
> for all the prep work to clean up the virt API, before only
> doing the split in Lxxxxx.
> 
> The actual nova/virt/driver.py has been more stable over the
> past few releases than I thought it would be. In terms of APIs
> we're not really modified existing APIs, mostly added new ones.
> Where we did modify existing APIs, we could have easily taken
> the approach of adding a new API in parallel and deprecating
> the old entry point to maintain compat.
> 
> The big change which isn't visible directly is the conversion
> of internal nova code to use objects. Finishing this conversion
> is clearly a pre-requisite to any such split, since we'd need
> to make sure all data passed into the nova virt APIs as parameters
> is stable & well defined. 
> 
>> As an alternative approach...
>>
>> What if we pushed most of the code for a driver into a library?
>> Imagine a library which controls the low level operations of a
>> hypervisor -- create a vm, attach a NIC, etc. Then the driver would
>> become a shim around that which was relatively thin, but owned the
>> interface into the nova core. The driver handles the nova specific
>> things like knowing how to create a config drive, or how to
>> orchestrate with cinder, but hands over all the hypervisor operations
>> to the library. If we found a bug in the library we just pin our
>> dependancy on the version we know works whilst we fix things.
>>
>> In fact, the driver inside nova could be a relatively generic "library
>> driver", and we could have multiple implementations of the library,
>> one for each hypervisor.
> 
> I don't think that particularly solves the problem, particularly
> the ones you are most concerned about above of API stability. The
> naive impl of any "library" for the virt driver would pretty much
> mirror the nova virt API. The virt driver impls would thus have to
> do the job of taking the Nova objects passed in as parameters and
> turning them into something "stable" to pass to the library. Except
> now instead of us only having to figure out a stable API in one
> place, every single driver has to reinvent the wheel defining their
> own stable interface & objects. I'd also be concerned that ongoing
> work on drivers is still going to require alot of patches to Nova
> to update the shims all the time, so we're still going to contend
> on resource fairly highly.
> 
>>> b) The conflict Dan is speaking of is around the current situation where we
>>> have a limited core review team bandwidth and we have to pick and choose
>>> which virt driver-specific features we will review. This leads to bad
>>> feelings and conflict.
>>
>> The way this worked in the past is we had cores who were subject
>> matter experts in various parts of the code -- there is a clear set of
>> cores who "get" xen or libivrt for example and I feel like those
>> drivers get reasonable review times. What's happened though is that
>> we've added a bunch of drivers without adding subject matter experts
>> to core to cover those drivers. Those newer drivers therefore have a
>> harder time getting things reviewed and approved.
> 
> FYI, for Juno at least I really don't consider that even the libvirt
> driver got acceptable review times in any sense. The pain of waiting
> for reviews in libvirt code I've submitted this cycle is what prompted
> me to start this thread. All the virt drivers are suffering way more
> than they should be, but those without core team representation suffer
> to an even greater degree.  And this is ignoring the point Jay & I
> were making about how the use of a single team means that there is
> always contention for feature approval, so much work gets cut right
> at the start even if maintainers of that area felt it was valuable
> and worth taking.

I continue to not understand how N non overlapping teams makes this any
better. You have to pay the integration cost somewhere. Right now we're
trying to pay it 1 patch at a time. This model means the integration
units get much bigger, and with less common ground.

Look at how much active work in crossing core teams we've had to do to
make any real progress on the neutron replacing nova-network front. And
how slow that process is. I think you'll see that hugely show up here.

>>> c) It's the impact to the CI and testing load that I see being the biggest
>>> benefit to the split-out driver repos. Patches proposed to the XenAPI driver
>>> shouldn't have the Hyper-V CI tests run against the patch. Likewise, running
>>> libvirt unit tests in the VMWare driver repo doesn't make a whole lot of
>>> sense, and all of these tests add a not-insignificant load to the overall
>>> upstream and external CI systems. The long wait time for tests to come back
>>> means contributors get frustrated, since many reviewers tend to wait until
>>> Jenkins returns some result before they review. All of this leads to
>>> increased conflict that would be somewhat ameliorated by having separate
>>> code repos for the virt drivers.
>>
>> It is already possible to filter CI runs to specific paths in the
>> code. We just didn't choose to do that for policy reasons. We could
>> change that right now with a trivial tweak to each CI system's zuul
>> config.
> 
> We have to jump through far more hoops to do so, even as developers
> running things locally. eg want to run pep8 locally to test your
> work ? You have to wait 3 minutes while it checks the entire of
> the nova codebase. So we had to invent a mode where it only checks
> the files in the current GIT HEAD. 

This time is almost entirely eaten by the module import checks in
hacking. If that's seen as a huge impediment to development, we could
turn them off. It would also massively reduce the tox env setup time (as
we wouldn't need to install all the project requirements). I was headed
down this path at one point, but no one seemed to care much, so I left
it behind.

I've actually got the beginnings of scripts that run hacking inline in
emacs for pyflakes using the local hacking ignores, so you don't need to
even run it locally, it shows you the file interactively with what's
wrong. Happy to get that into a better state to make it available.

> Likewise for unit tests - if you
> invoke them you have to pass args to filter to just the area of the
> repo you are working on. These kind of problems simply goes away
> completely if we have separate repos without having to do special
> setup tasks. Smaller modules would be far less daunting for new
> contributors looking to get involved in Nova development too which
> I think is an important factor

Personally, I don't see 'tox -epy27 libvirt' to be an unreasonable
invocation, but we can create some additional targets in tox to make
life simpler.

Anyway, at the end of the day, my feeling is:

 * Pay down debt in virt layer abstraction: +1
 * Speed up tests in common ways (especially pep8): +1
 * Don't have 3rd Party team run non relevant tests: +1
 * Split out the virt drivers into separate repos: -1

That being said that split is at least 7 months (if not 12 or 15) away.
So I don't think we even need to make that decision at this point. There
is a ton of stuff that needs to get done regardless. Paying down that
debt in the virt layer will make the whole thing easier to review and in
6 months time we can assess if we think that the biggest inhibitor to
Nova is really that split.

	-Sean

-- 
Sean Dague
http://dague.net



More information about the OpenStack-dev mailing list