[openstack-dev] Thoughts on OpenStack Layers and a Big Tent model

Zane Bitter zbitter at redhat.com
Wed Sep 24 17:55:57 UTC 2014


On 18/09/14 14:53, Monty Taylor wrote:
> Hey all,
>
> I've recently been thinking a lot about Sean's Layers stuff. So I wrote
> a blog post which Jim Blair and Devananda were kind enough to help me edit.
>
> http://inaugust.com/post/108

Thanks Monty, I think there are some very interesting ideas in here.

I'm particularly glad to see the 'big tent' camp reasserting itself, 
because I have no sympathy with anyone who wants to join the OpenStack 
community and then bolt the door behind them. Anyone who contributes to 
a project that is related to OpenStack's goals, is willing to do things 
the OpenStack way, and submits itself to the scrutiny of the TC deserves 
to be treated as a member of our community with voting rights, entry to 
the Design Summit and so on.

I'm curious how you're suggesting we decide which projects satisfy those 
criteria though. Up until now, we've done it through the incubation 
process (or technically, the new program approval process... but in 
practice we've never added a project that was targeted for eventual 
inclusion in the integrated release to a program without incubating it). 
Would the TC continue to judge whether a project is doing things the 
OpenStack way prior to inclusion, or would we let projects self-certify? 
What does it mean for a project to submit itself to TC scrutiny if it 
knows that realistically the TC will never have time to actually 
scrutinise it? Or are you not suggesting a change to the current 
incubation process, just a willingness to incubate multiple projects in 
the same problem space?

I feel like I need to play devil's advocate here, because overall I'm 
just not sure I understand the purpose of arbitrarily - and it *is* 
arbitrary - declaring "Layer #1" to be anything required to run 
Wordpress. To anyone whose goal is not to run Wordpress, how is that 
relevant?

Speaking of arbitrary, I had to laugh a little at this bit:

  Also, please someone notice that the above is too many steps and 
should be:

   openstack boot gentoo on-a 2G-VM with-a publicIP with-a 10G-volume 
call-it blog.inaugust.com

That's kinda sorta exactly what Heat does ;) Minus the part about 
assuming there is only one kind of application, obviously.


I think there are a number of unjustified assumptions behind this 
arrangement of things. I'm going to list some here, but I don't want 
anyone to interpret this as a personal criticism of Monty. The point is 
that we all suffer from biases - not for any questionable reasons but 
purely as a result of our own experiences, who we spend our time talking 
to and what we spend our time thinking about - and therefore we should 
all be extremely circumspect about trying to bake our own mental models 
of what OpenStack should be into the organisational structure of the 
project itself.

* Assumption #1: The purpose of OpenStack is to provide a Compute cloud

This assumption is front-and-centre throughout everything Monty wrote. 
Yet this wasn't how the OpenStack project started. In fact there are now 
at least three services - Swift, Nova, Zaqar - that could each make 
sense as the core of a standalone product.

Yes, it's true that Nova effectively depends on Glance and Neutron (and 
everything depends on Keystone). We should definitely document that 
somewhere. But why does it make Nova special?

* Assumption #2: Yawnoc's Law

Don't bother Googling that, I just made it up. It's the reverse of 
Conway's Law:

   Infra engineers who design governance structures for OpenStack are
   constrained to produce designs that are copies of the structure of
   Tempest.

I just don't understand why that needs to be the case. Currently, for 
understandable historic reasons, every project gates against every other 
project. That makes no sense any more, completely independently of the 
project governance structure. We should just change it! There is no 
organisational obstacle to changing how gating works.

Even this proposal doesn't entirely make sense on this front - e.g. 
Designate requires only Neutron and Keystone... why should Nova, Glance 
and every other project in "Layer 1" gate against it, and vice-versa?

I suggested in another thread[1] a model where each project would 
publish a set of tests, each project would decide which sets of tests to 
pull in and gate on, and Tempest would just be a shell for setting up 
the environment and running the selected tests. Maybe that idea is crazy 
or at least needs more work (it certainly met with only crickets and 
tumbleweeds on the mailing list), but implementing it wouldn't require 
TC intervention and certainly not by-laws changes. It just requires... 
implementing it.

Perhaps the idea here is that by designating "Layer 1" the TC is 
indicating to projects which other projects they should accept gate test 
jobs from (a function previously fulfilled by Incubation). I'd argue 
that this is a very bad way to do it, because (a) it says nothing to 
projects outside of "Layer 1" how they should decide, and (b) it jumps 
straight to the TC mandating the result without even letting the 
projects try to sort it out amongst themselves.

For example, I would actually prefer that Nova not gate against Heat 
because Nova is pretty unlikely to break us and the trade-off of putting 
us in a position to accidentally break them is not worth it. No edict 
from the TC required. On the other hand, I would push very strongly for 
all of the python-*client libraries to gate against both Heat and 
Horizon, because they can easily break us - and if they break us, 
they're probably breaking other users out there too, so I'm confident I 
could convince people that this would be mutually beneficial. (It could 
potentially even extend so far as running the unit tests of Heat and 
Horizon in the client gates, to avoid issues like [2].)

[1] 
http://lists.openstack.org/pipermail/openstack-dev/2014-September/045446.html
[2] 
http://lists.openstack.org/pipermail/openstack-dev/2014-September/046686.html

* Assumption #3: The world is static

This is a giant red flag:

   "the set of things in Layer #1 should never change -- unless we
    refactor something already in Layer #1 into a new project."

There is no greater act of hubris than to stick a stake in the ground 
and declare that "we will never know more than we do at this moment; 
we'll only get dumber from here, so we must precommit to all of our 
future decisions based on the information we have at present".

What if, for example, Nova wanted to add a dependency on Zaqar? They'd 
be prevented from doing so because Zaqar is not used by Wordpress. How 
is that relevant? A rigid ban on dependencies is a death knell for 
innovation.

Can you really never imagine a time where it might be better to run 
Wordpress on a container service rather than a full-fledged VM? I guess 
that's OK but only as long as it starts in Nova and then gets split out? 
Because... nova-core don't have enough to do?

And none of this is any help at all to projects outside of "Layer 1", 
because they get no guidance at all on what makes sense to depend on. 
This is already hurting with our current system (for example, Mistral is 
implementing a bunch of notification stuff that should properly be 
delegated to Zaqar, and in fact as of 6 months ago it was the 
centrepiece of the design), and the TC abdicating all interest in the 
subject will make it even worse.

* Assumption #4: The sky is falling

 From reading openstack-dev, it's pretty clear that both the QA and Nova 
programs are facing a scaling crisis of sorts. It's easy to see why 
anybody deeply involved with either or both of those two would indeed 
think that radical change is required. I'm not sure, however, that the 
same sense of crisis pervades all of the other projects. We all have a 
lot of work to do, but I suspect that most projects would say that they 
are trucking along nicely. Meanwhile, the proposal is to change pretty 
much everything about how OpenStack is organised *except* QA and Nova 
(in fact, it creates incentives to stick even more stuff inside Nova), 
which remain sacrosanct. That doesn't seem like attacking the problem at 
its source.


So we've identified the minimum set of OpenStack services required to 
sensibly run Wordpress. Awesome! Somebody should totally write a blog 
post about that. But officially and permanently baking that in as the 
structure of the OpenStack project? I hate to use the c-word, but the 
bottom line is that "Layer 1" just resurrects Core with a pretext to 
finally kick Swift out. That seems particularly ironic, because I would 
pay good money to be a fly on the wall in a board meeting where anyone 
but Monty proposed such a thing in those terms, just to watch his 
reaction. Given that the TC informed the DefCore committee that it 
regarded everything that has graduated to the integrated release as the 
"designated sections" for DefCore purposes and told them to go do their 
own dirty work, you can bet your last dollar that this will be 
interpreted as a TC endorsement for permanently excluding Swift - and 
all the other non-"Layer 1" projects - from the designated sections. In 
fact, by removing only those tests from Tempest it's likely to have the 
side-effect of eliminating them from RefStack altogether.


Let's sum up, first by looking at a list of questions that developers, 
distributors, operators and users might ask about a project:

1) Are they "one of us"?
2) Should I gate against it?
3) Can I add a dependency on it?
4) Should this be widely distributed as part of OpenStack?
5) Can I use this knowing that the API will be somewhat stable?
6) Should this be used at scale in production?


Here's how the TC is answering those questions at the moment:

1) New program acceptance + incubation or adoption processes
2) Incubation process
3) Graduation process
4) Graduation process
5) Graduation process
6) You're on your own

Here's Monty's answers:

1) ???
2) No
3) No
4) You're on your own
5) You're on your own?
6) "CERN test"

Both of those feel unsatisfactory in different ways. Monty's suggestions 
seem like an overly radical change to me; I would like to try something 
a bit more incremental to give us the chance to see how the community 
adapts:

1) Incubation process (much lower bar)
2) Do your own cost/benefit analysis
3) Graduation process
4) Graduation process (maintain high bar, but less capricious)
5) Graduation process
6) TC/UC production-readiness review


Finally, since the motivation for change is that we think the current 
structure isn't scaling, let's examine the individual things that are 
currently pain points:

* Continuous Integration

We all agree that the gate doesn't scale. I submit that it doesn't scale 
because it tests every project against every other project, and that 
kicking projects out of the gate not only fails to solve the problem in 
the long term (since the projects that _are_ in will continue to grow), 
but also ignores the actual risks that the gate is meant to guard 
against in favour of an arbitrary designation.

We should scale the gate by only gating projects against other projects 
where the benefit in reduced risk outweighs the cost in increased risk 
of false negatives. For projects that don't depend on each other at all, 
the benefit is precisely zero (beyond the install-only gate suggested by 
Monty, which I support). We should apply the same cost-benefit 
calculation regardless of how involved the projects in question are with 
running Wordpress, and we should let projects themselves decide what to 
gate against in the first instance, with the TC only stepping in in the 
event that consensus can't be reached by other means.

* Documentation

This is a tricky one, and not an area of OpenStack that I am an expert 
on. It does seem to me that the only real solution is to make projects 
more responsible for their own documentation. Arbitrarily splitting 
projects into a category where they're not responsible at all and a 
category where they're completely on their own doesn't seem like a good 
solution.

* Release Management

This is something we have not really even attempted to scale beyond 
Thierry. As a first step, there is no real organisational obstacle to 
having a different release manager for incubated projects than for 
integrated projects, it's more a matter of making it known to either the 
Foundation or the various companies who employ contributors that we need 
one. I don't want to make that process sound trivial, but I'm confident 
that the release management program could handle it, and I think we 
should at least give them a chance to try before pre-emptively kicking 
anything non-Wordpress-related out of the release forever.

* Technical Committee

It is inevitable that we will reach a point where the Technical 
Committee itself does not scale. I'm surprised, because I thought that 
was a ways off, but after watching the latest Zaqar fiasco I think we 
have to consider the possibility that we have reached that point already.

Perhaps we should consider having subcommittees, maybe based on the 
groupings identified by John (Dickinson), possibly comprised of the 
relevant PTLs plus a representative of the TC. These subcommittees would 
do the legwork of investigating new projects making their way through 
the incubation/graduation process and report summaries and 
recommendations to the TC.

cheers,
Zane.



More information about the OpenStack-dev mailing list