[openstack-dev] Thoughts on OpenStack Layers and a Big Tent model
Zane Bitter
zbitter at redhat.com
Wed Sep 24 17:55:57 UTC 2014
On 18/09/14 14:53, Monty Taylor wrote:
> Hey all,
>
> I've recently been thinking a lot about Sean's Layers stuff. So I wrote
> a blog post which Jim Blair and Devananda were kind enough to help me edit.
>
> http://inaugust.com/post/108
Thanks Monty, I think there are some very interesting ideas in here.
I'm particularly glad to see the 'big tent' camp reasserting itself,
because I have no sympathy with anyone who wants to join the OpenStack
community and then bolt the door behind them. Anyone who contributes to
a project that is related to OpenStack's goals, is willing to do things
the OpenStack way, and submits itself to the scrutiny of the TC deserves
to be treated as a member of our community with voting rights, entry to
the Design Summit and so on.
I'm curious how you're suggesting we decide which projects satisfy those
criteria though. Up until now, we've done it through the incubation
process (or technically, the new program approval process... but in
practice we've never added a project that was targeted for eventual
inclusion in the integrated release to a program without incubating it).
Would the TC continue to judge whether a project is doing things the
OpenStack way prior to inclusion, or would we let projects self-certify?
What does it mean for a project to submit itself to TC scrutiny if it
knows that realistically the TC will never have time to actually
scrutinise it? Or are you not suggesting a change to the current
incubation process, just a willingness to incubate multiple projects in
the same problem space?
I feel like I need to play devil's advocate here, because overall I'm
just not sure I understand the purpose of arbitrarily - and it *is*
arbitrary - declaring "Layer #1" to be anything required to run
Wordpress. To anyone whose goal is not to run Wordpress, how is that
relevant?
Speaking of arbitrary, I had to laugh a little at this bit:
Also, please someone notice that the above is too many steps and
should be:
openstack boot gentoo on-a 2G-VM with-a publicIP with-a 10G-volume
call-it blog.inaugust.com
That's kinda sorta exactly what Heat does ;) Minus the part about
assuming there is only one kind of application, obviously.
I think there are a number of unjustified assumptions behind this
arrangement of things. I'm going to list some here, but I don't want
anyone to interpret this as a personal criticism of Monty. The point is
that we all suffer from biases - not for any questionable reasons but
purely as a result of our own experiences, who we spend our time talking
to and what we spend our time thinking about - and therefore we should
all be extremely circumspect about trying to bake our own mental models
of what OpenStack should be into the organisational structure of the
project itself.
* Assumption #1: The purpose of OpenStack is to provide a Compute cloud
This assumption is front-and-centre throughout everything Monty wrote.
Yet this wasn't how the OpenStack project started. In fact there are now
at least three services - Swift, Nova, Zaqar - that could each make
sense as the core of a standalone product.
Yes, it's true that Nova effectively depends on Glance and Neutron (and
everything depends on Keystone). We should definitely document that
somewhere. But why does it make Nova special?
* Assumption #2: Yawnoc's Law
Don't bother Googling that, I just made it up. It's the reverse of
Conway's Law:
Infra engineers who design governance structures for OpenStack are
constrained to produce designs that are copies of the structure of
Tempest.
I just don't understand why that needs to be the case. Currently, for
understandable historic reasons, every project gates against every other
project. That makes no sense any more, completely independently of the
project governance structure. We should just change it! There is no
organisational obstacle to changing how gating works.
Even this proposal doesn't entirely make sense on this front - e.g.
Designate requires only Neutron and Keystone... why should Nova, Glance
and every other project in "Layer 1" gate against it, and vice-versa?
I suggested in another thread[1] a model where each project would
publish a set of tests, each project would decide which sets of tests to
pull in and gate on, and Tempest would just be a shell for setting up
the environment and running the selected tests. Maybe that idea is crazy
or at least needs more work (it certainly met with only crickets and
tumbleweeds on the mailing list), but implementing it wouldn't require
TC intervention and certainly not by-laws changes. It just requires...
implementing it.
Perhaps the idea here is that by designating "Layer 1" the TC is
indicating to projects which other projects they should accept gate test
jobs from (a function previously fulfilled by Incubation). I'd argue
that this is a very bad way to do it, because (a) it says nothing to
projects outside of "Layer 1" how they should decide, and (b) it jumps
straight to the TC mandating the result without even letting the
projects try to sort it out amongst themselves.
For example, I would actually prefer that Nova not gate against Heat
because Nova is pretty unlikely to break us and the trade-off of putting
us in a position to accidentally break them is not worth it. No edict
from the TC required. On the other hand, I would push very strongly for
all of the python-*client libraries to gate against both Heat and
Horizon, because they can easily break us - and if they break us,
they're probably breaking other users out there too, so I'm confident I
could convince people that this would be mutually beneficial. (It could
potentially even extend so far as running the unit tests of Heat and
Horizon in the client gates, to avoid issues like [2].)
[1]
http://lists.openstack.org/pipermail/openstack-dev/2014-September/045446.html
[2]
http://lists.openstack.org/pipermail/openstack-dev/2014-September/046686.html
* Assumption #3: The world is static
This is a giant red flag:
"the set of things in Layer #1 should never change -- unless we
refactor something already in Layer #1 into a new project."
There is no greater act of hubris than to stick a stake in the ground
and declare that "we will never know more than we do at this moment;
we'll only get dumber from here, so we must precommit to all of our
future decisions based on the information we have at present".
What if, for example, Nova wanted to add a dependency on Zaqar? They'd
be prevented from doing so because Zaqar is not used by Wordpress. How
is that relevant? A rigid ban on dependencies is a death knell for
innovation.
Can you really never imagine a time where it might be better to run
Wordpress on a container service rather than a full-fledged VM? I guess
that's OK but only as long as it starts in Nova and then gets split out?
Because... nova-core don't have enough to do?
And none of this is any help at all to projects outside of "Layer 1",
because they get no guidance at all on what makes sense to depend on.
This is already hurting with our current system (for example, Mistral is
implementing a bunch of notification stuff that should properly be
delegated to Zaqar, and in fact as of 6 months ago it was the
centrepiece of the design), and the TC abdicating all interest in the
subject will make it even worse.
* Assumption #4: The sky is falling
From reading openstack-dev, it's pretty clear that both the QA and Nova
programs are facing a scaling crisis of sorts. It's easy to see why
anybody deeply involved with either or both of those two would indeed
think that radical change is required. I'm not sure, however, that the
same sense of crisis pervades all of the other projects. We all have a
lot of work to do, but I suspect that most projects would say that they
are trucking along nicely. Meanwhile, the proposal is to change pretty
much everything about how OpenStack is organised *except* QA and Nova
(in fact, it creates incentives to stick even more stuff inside Nova),
which remain sacrosanct. That doesn't seem like attacking the problem at
its source.
So we've identified the minimum set of OpenStack services required to
sensibly run Wordpress. Awesome! Somebody should totally write a blog
post about that. But officially and permanently baking that in as the
structure of the OpenStack project? I hate to use the c-word, but the
bottom line is that "Layer 1" just resurrects Core with a pretext to
finally kick Swift out. That seems particularly ironic, because I would
pay good money to be a fly on the wall in a board meeting where anyone
but Monty proposed such a thing in those terms, just to watch his
reaction. Given that the TC informed the DefCore committee that it
regarded everything that has graduated to the integrated release as the
"designated sections" for DefCore purposes and told them to go do their
own dirty work, you can bet your last dollar that this will be
interpreted as a TC endorsement for permanently excluding Swift - and
all the other non-"Layer 1" projects - from the designated sections. In
fact, by removing only those tests from Tempest it's likely to have the
side-effect of eliminating them from RefStack altogether.
Let's sum up, first by looking at a list of questions that developers,
distributors, operators and users might ask about a project:
1) Are they "one of us"?
2) Should I gate against it?
3) Can I add a dependency on it?
4) Should this be widely distributed as part of OpenStack?
5) Can I use this knowing that the API will be somewhat stable?
6) Should this be used at scale in production?
Here's how the TC is answering those questions at the moment:
1) New program acceptance + incubation or adoption processes
2) Incubation process
3) Graduation process
4) Graduation process
5) Graduation process
6) You're on your own
Here's Monty's answers:
1) ???
2) No
3) No
4) You're on your own
5) You're on your own?
6) "CERN test"
Both of those feel unsatisfactory in different ways. Monty's suggestions
seem like an overly radical change to me; I would like to try something
a bit more incremental to give us the chance to see how the community
adapts:
1) Incubation process (much lower bar)
2) Do your own cost/benefit analysis
3) Graduation process
4) Graduation process (maintain high bar, but less capricious)
5) Graduation process
6) TC/UC production-readiness review
Finally, since the motivation for change is that we think the current
structure isn't scaling, let's examine the individual things that are
currently pain points:
* Continuous Integration
We all agree that the gate doesn't scale. I submit that it doesn't scale
because it tests every project against every other project, and that
kicking projects out of the gate not only fails to solve the problem in
the long term (since the projects that _are_ in will continue to grow),
but also ignores the actual risks that the gate is meant to guard
against in favour of an arbitrary designation.
We should scale the gate by only gating projects against other projects
where the benefit in reduced risk outweighs the cost in increased risk
of false negatives. For projects that don't depend on each other at all,
the benefit is precisely zero (beyond the install-only gate suggested by
Monty, which I support). We should apply the same cost-benefit
calculation regardless of how involved the projects in question are with
running Wordpress, and we should let projects themselves decide what to
gate against in the first instance, with the TC only stepping in in the
event that consensus can't be reached by other means.
* Documentation
This is a tricky one, and not an area of OpenStack that I am an expert
on. It does seem to me that the only real solution is to make projects
more responsible for their own documentation. Arbitrarily splitting
projects into a category where they're not responsible at all and a
category where they're completely on their own doesn't seem like a good
solution.
* Release Management
This is something we have not really even attempted to scale beyond
Thierry. As a first step, there is no real organisational obstacle to
having a different release manager for incubated projects than for
integrated projects, it's more a matter of making it known to either the
Foundation or the various companies who employ contributors that we need
one. I don't want to make that process sound trivial, but I'm confident
that the release management program could handle it, and I think we
should at least give them a chance to try before pre-emptively kicking
anything non-Wordpress-related out of the release forever.
* Technical Committee
It is inevitable that we will reach a point where the Technical
Committee itself does not scale. I'm surprised, because I thought that
was a ways off, but after watching the latest Zaqar fiasco I think we
have to consider the possibility that we have reached that point already.
Perhaps we should consider having subcommittees, maybe based on the
groupings identified by John (Dickinson), possibly comprised of the
relevant PTLs plus a representative of the TC. These subcommittees would
do the legwork of investigating new projects making their way through
the incubation/graduation process and report summaries and
recommendations to the TC.
cheers,
Zane.
More information about the OpenStack-dev
mailing list