Open Stack

Tue Oct 15 23:36:03 UTC 2013

On 10/15/2013 04:54 PM, Vishvananda Ishaya wrote:
> Hi Everyone,
>
> I've been following this conversation and weighing the different sides. This is a tricky issue but I think it is important to decouple further and extend our circle of trust.
>
> When nova started it was very easy to do feature development. As it has matured the pace has slowed. This is expected and necessary, but we periodically must make decoupling decisions or we will become mired in overhead. We did this already with cinder and neutron, and we have discussed doing this with virt drivers in the past.
>
> We have a large number of people attempting to contribute to small sections of nova and getting frustrated with the process.  The perception of developers is much more important than the actual numbers here. If people are frustrated they are disincentivized to help and it hurts everyone. Suggesting that these contributors need to learn all of nova and help with the review queue is silly and makes us seem elitist. We should make it as easy as possible for new contributors to help.
>
> I think our current model is breaking down at our current size and we need to adopt something more similar to the linux model when dealing with subsystems. The hyper-v team is the only one suggesting changes, but there have been similar concerns from the vmware team. I have no doubt that there are similar issues with the PowerVM, Xen, Docker, lxc and even kvm driver contributors.

The Linux kernel process works for a couple of reasons...

1) the subsystem maintainers have known each other for a solid decade 
(i.e. 3x the lifespan of the OpenStack project), over a history of 10 
years, of people doing the right things, you build trust in their judgment.

*no one* in the Linux tree was given trust first, under the hope that it 
would work out. They had to earn it, hard, by doing community work, and 
not just playing in their corner of the world.

2) This 
http://www.wired.com/wiredenterprise/2012/06/torvalds-nvidia-linux/ is 
completely acceptable behavior. So when someone has bad code, they are 
flamed to within an inch of their life, repeatedly, until they never 
ever do that again. This is actually a time saving measure in code 
review. It's a lot faster to just call people idiots then to help them 
with line by line improvements in their code, 10, 20, 30, or 40 
iterations in gerrit.

We, as a community have decided, I think rightly, that #2 really isn't 
in our culture. But you can't start cherry picking parts of the Linux 
kernel community without considering how all the parts work together. 
The good and the bad are part of why the whole system works.

> In my opinion, nova-core needs to be willing to trust the subsystem developers and let go of a little bit of control. I frankly don't see the drawbacks.

I actually see huge draw backs. Culture matters. Having people active 
and willing to work on real core issues matter. The long term health of 
Nova matters.

As the QA PTL I can tell you that when you look at Nova vs. Cinder vs. 
Neutron, you'll see some very clear lines about how long it takes to get 
to the bottom of a race condition, and how many deep races are in each 
of them. I find this directly related to the stance each project has 
taken on whether it's socially acceptable to only work on your own 
vendor code. Nova's insistence up until this point that if you only play 
in your corner, you don't get the same attention is important incentive 
for people to integrate and work beyond just their boundaries. I think 
diluting this part of the culture would be hugely detrimental to Nova.

Let's take an example that came up today, the compute_diagnostics API. 
This is an area where we've left it completely to the virt drivers to 
vomit up a random dictionary of the day for debugging reasons, and 
stamped it as an API. With a model where we let virt driver authors go 
hide in a corner, that's never going to become an API with any kind of 
contract, and given how much effort we've spent on ensuring RPC 
versioning and message formats, the idea that we are exposing a public 
rest endpoint that's randomly fluctuating data based on date and 
underlying implementation, is a bit saddening.

> I'm leaning towards giving control of the subtree to the team as the best option because it is simple and works with our current QA system. Alternatively, we could split out the driver into a nova subproject (2 below) or we could allow them to have a separate branch and do a trusted merge of all changes at the end of the cycle (similar to the linux model).
>
> I hope we can come to a solution to the summit that makes all of our contributors want to participate more. I believe that giving people more responsibility inspires them to participate more fully.

I would like nothing more than all our contributors to participate more. 
But more has to mean caring about not only your stuff.

I was called out today in the hyper-v meeting because I had the audacity 
to -1 a hyper-v patch because I wanted some reference in the code 
somewhere to format references so why we had some new random seek call 
would be understood by people down the road - 
http://eavesdrop.openstack.org/meetings/hyper_v/2013/hyper_v.2013-10-15-16.03.log.html

As OpenStack grows, the single biggest factor in it's success isn't 
going to be a feature in a driver, it's going to be if this crazy 
complicated system holds together. Whether or not we've got a handle on 
the emergent behavior that happens in an asynchronous message based 
system, with 10s of integrated projects, and many dozens of daemons 
cross talking with each other.

I mean seriously, one of the only reasons we made it through to Havana 
RC phase is because we built a search engine based system to build 
statistical frequency analysis of unique failures on our gate resets to 
fully expose the top race conditions that had gotten so bad the gate 
basically locked up. And a bunch of people went all hands on deck to 
drive these out. People jumped across normal project lines to help on 
some of these top bugs, because that's what makes OpenStack a whole system.

Things actually looked *really* bleak for release for a while. All the 
people that helped out and got us through this deserve a huge pat on the 
back. That's what OpenStack is about.

So I feel pretty strongly that optimizing the contribution process for 
people that aren't helping with that larger problem, is the tragedy of 
the commons, and I think entirely the wrong optimization to be made.

     -Sean

-- 
Sean Dague
http://dague.net

Open Stack

[openstack-dev] Hyper-V meeting Minutes

OpenStack

Community

Documentation

Branding & Legal