[openstack-dev] [chef] Making the Kitchen Great Again: A Retrospective on OpenStack & Chef
s at cassiba.com
Wed Feb 15 04:44:30 UTC 2017
The HTML version is here:
This was influenced by Graham Hayes' State of the Project for Designate:
I have been asked recently "what is going on with the OpenStack-Chef project?",
"how is the state of the cookbooks?", and "hey sc, how are those integration
tests coming?". Having been the PTL for the Newton and Ocata cycles, yet
having not shipped a release, is the unthinkable, and deserves at least a
sentence or two.
It goes without saying, this is disheartening and depressing to me and
everybody that has devoted their time to making the cookbooks a solid
and viable method for deploying OpenStack. OpenStack-Chef is among the
oldest and most mature solutions for deploying OpenStack, though it is
not the most feature-rich.
*TL;DR* if you don't want to keep going -
OpenStack-Chef is not in a good place and is not sustainable.
OpenStack-Chef has always been a small project with a big responsibility.
The Chef approach to OpenStack historically has required a level of
investment within the Chef ecosystem, which is a hard enough sell when you
started out with Puppet or Ansible. Despite the unicorns and rainbows of
being Chef cookbooks, OpenStack-Chef always asserted itself as an OpenStack
project first, up to and including joining the Big Tent, whatever it takes.
To beat that drum, we are OpenStack.
There is no *cool* factor from deploying and managing OpenStack using Chef,
unless you've been running Chef, because insert Xzibit meme here and jokes
about turtles. Unless you break something with automation, then it's
applause or facepalm. Usually both. At the same time.
As with any kitchen, it must be stocked and well maintained, and
OpenStack-Chef is no exception. Starting out, there was a vibrant community
producing organic, free-range code. Automation is invisible, assumed to be
there in the background. Once it's in place, it isn't touched again unless
it breaks. Upgrades in complex deployments can be fraught with error, even
in an automated fashion.
As has been seen in previous surveys, once an OpenStack release has chosen
by an operator, some tend to not upgrade for the next cycle or three, to get
the immediate bugs worked out. Though there are now multinode and upgrade
scenarios supported with the Puppet OpenStack and TripleO projects, they do
not use Chef, so Chef deployers do not directly benefit from any of this
Being a deployment project, we are responsible for not one aspect of
the OpenStack project but as many as can be reasonably supported.
We were very fortunate in the beginning, having support from public cloud
providers, as well as large private cloud providers. Stackalytics shows a
vibrant history, a veritable who's-who of OpenStack contributors, too many to
name. They've all moved on, working on other things.
As a previous PTL for the project once joked, the Chef approach to OpenStack
was the "other deployment tool that nobody uses". As time has gone by, that has
become more of a true statement.
There are a few of us still cooking away, creating new recipes and cookbooks. The
pilot lights are still lit and there's usually something simmering away on the
back burner, but there is no shouting of orders, and not every dish gets tasted.
We think there might be rats, too, but we’re too shorthanded to maintain the traps.
We have yet to see many (meaningful) contributions from the community, however.
We have some amazing deployers that file bugs, and if they can, push up a patch.
It delights me when someone other than a core weighs in on a review. They are
highly appreciated and incredibly valuable, but they are very tactical
contributions. A project cannot live on such contributions.
Where does that leave OpenStack-Chef? Let's take a look at the numbers:
| Cycle | Commits |
| Havana | 557 |
| Icehouse | 692 |
| Juno | 424 |
| Kilo | 474 |
| Liberty | 259 |
| Mitaka | 85 |
| Newton | 112 |
| Ocata | 78 |
As of the time of this writing, Newton has not yet branched. Yes, you read
correctly. This means the Ocata cycle has gone to ensuring that Newton *just
functions*. In a virtual quasi-vacuum, without input from larger scale
deployments, who are running releases older than Newton, reporting bugs we've
fixed in master. Supporting Newton required implementing support for Ubuntu
16.04, as well as client and underlying cookbook changes, due to deprecations
that started prior to Newton. Here is the output from *berks viz* for a top-down
view into the complexity on just the Chef side.
For the Pike cycle, Jan Klare will be reprising the role of PTL. I do not
intend to speak for him, but there are few paths forward in the Big Tent:
* Branching stable/newton and stable/ocata with the quickness.
* Improve OpenStack CI to the point of being able to trust it again for
testing patches, as well as extend testing scenarios (including multinode).
For branching stable/newton, the external CI has been proving useful in overall
confidence in cutting a release. We're way behind schedule, but nearly there. I
have begun working on implementing some basic multinode gates, as our allinone
no longer fits within the confines of the 8GB instances. But, it’s Chef, so
triangle wheels, yo. Some of the cross-project efforts translate to Chef, but
not all. With square spinners.
So... how did this happen?
As was in the case of Designate, as is in the case of OpenStack-Chef. There is
no one single reason or cause that arrived us at this point.
The main catalyst was internal support shifting, which impacted the sponsored
developers and contributors. OpenStack-Chef became less and less a priority,
and one by one they shifted to other focuses. At the Austin 2016 Summit, we said
farewell to all but the PTL and one core. This put OpenStack-Chef in a bad place
given its mission and scope, but onward we go.
Due to the volume of work done by this small group and the lack of feedback
during development, it became more and more difficult to tell when a release
could be considered "done". We could no longer trust our CI framework, as the
developers with intrinsic knowledge had been refocused, with little more than
commit history to go on.
Users were okay with leaving us work, which we added to the heap. This, with
the departure in contributors, resulted in the majority of the development
being funded by just two companies, which left the project at risk to changes
in direction by those companies. Without regular feedback or guidance beyond
release notes and the occasional chat in another project's channel, the focus
shifted away from features to just ship it, as long as it passes allinone and/or
multinode locally, if there’s time. Does it pass lint/unit/style? Fuck it. Ship
it, deal with the fallout. Yeah. This is bad on *so many* levels.
The Big Tent really did not do as much as advertised for OpenStack-Chef, as harsh
of an opinion as that sounds. Larger, more well-funded projects have since created
processes, frameworks and test suites that were developed for their own use cases,
not necessarily taking into account Chef's own blend of automation. That left
us having to go and discover how to make fire on our own to make the cookbooks
work on each supported platform and release of OpenStack. In the Big Tent, we
were effectively left to our own devices. Just another OpenStack project. We
numbered nine cores when we moved from StackForge to the Big Tent. Developer
peak, though we did not know it yet.
Initially, the cookbooks had a very heavy dependency: Chef Server. If not Chef
Server, Chef Solo, which still had its own quirks, and nobody liked Chef Solo
anyway. Not even Chef Solo liked itself. During the Juno cycle, we switched to
the Chef Development Kit, which gave us chef-provisioning. This decreased
turnaround time for testing patches being submitted, and boosted confidence all
around. Until Juno, it was difficult to run functional tests against the
cookbooks. That's when we discovered how to create fire. We could run
OpenStack! In virtual machines! On our laptops! OMyG you guys! Suddenly,
OpenStack on the laptop became easy, push button, single command, automated. We
could test a patch without a long spin-up. With that, came integration gates,
and periodic jobs. From days to minutes. We were cooking with gas! But... let’s
not make those integration gates voting... *yet*.
Mitaka brought a significant overhaul and simplification, with the introduction
of a multinode chef-provisioning recipe and more modular cookbooks. The pieces
finally existed, but the damage had been done, and unfortunately, this momentum
did not last. Internal priorities changed within companies sponsoring
developers, many of which could not be fully committed in the first place, and
we started shedding contributors, which happens. At this point is usually where
*someone* comes in to either play Grim Reaper or lifesaver. By Austin, our
numbers waned until just two cores remained, Jan and myself. We could not be in
the worst of locations to communicate, he in Germany and I in California.
In the Newton and Ocata cycles, development progressed in a lurching capacity
without a team. Due to the overhaul in Mitaka, patches slowed in frequency from
the outside community, many of who continued deploying and running on older,
EOL branches, or got frustrated enough to switch to other automation flavors.
The remaining team had little overlapping time to communicate, being on
different continents in conflicting time zones. What was difficult to do with a
larger team spread across three continents and five time zones became
impossible with just two. Day jobs increasingly took priority over
OpenStack-Chef care and feeding, and some cookbooks started to go rancid
(sorry, Ironic, Sahara, Swift and Trove. nobody was able to support a
deployment with you). Interaction within the development team was limited to an
hour or two a day, eventually down to once or twice a week if we had time. Day
jobs proceeded to consume the development team, with sporadic development as
the months ticked on.
Over the Newton cycle, one cookbook was offered, EC2, with inadequate coverage
for our support matrix. The most desired integration API. In the end, it was
not integrated due to time and commitment to support such a feature, having
inadequate resources at our disposal. During Ocata, the project had one
cookbook contributed from the community, Murano, that could be integrated, and
grew an appendage in the form of the client cookbook. It is the closest anyone
has gotten to new features since the Mitaka cycle. We added one core reviewer
during this time.
Communication is a big part of any project, particularly a geographically
diverse effort like OpenStack. Prior to the Big Tent, we held weekly meetings
using Hangouts, which were open and publicized for the mailing list subscribers
and channel denizens. Upon joining the Big Tent, we gave up the regular
face-time in favor of text-based IRC meetings, per governance. Without the high
bandwidth requirement of a video call once a week, one by one, cores had day
job meetings do what they do, and take priority. In the Newton cycle, we
relinquished our weekly time slot after it became apparent that neither of us
could make the meetings. We have not held a scheduled meeting since then, as it
is next to impossible to carve out adequate overlapping time.
We still have many of these problems to this day. Documentation is a mess or
nonexistent. Despite the flexibility of the tooling, users have but two
representative deployment examples: allinone or a rudimentary multinode.
OpenStack-Chef gained modularity at the expense of features, and there was an
overwhelming non-reaction to the deprecation of those features.
All of this results in a project that is not very friendly to new users, and
Chef does not look as attractive as a deployment option as other, more
feature-rich flavors. This has real business decisions behind it. One only need
look at the steady usage decline in the surveys to see how negatively things
appear to existing and new users. This, for a project that has roots in the
very cubicles OpenStack was born.
But it's not all bad
For all the negativity, this story has upsides. I call to the people who
actually use this software in their deployments, in whatever shape it's in, to
not abandon OpenStack-Chef, or retire it to bitrot. We need help, not funeral
arrangements. Share your pain, so that we may find a way forward together, not
In my time with the project, I’ve gotten to know that there are some pretty big
names that leverage Chef in their deployments, and some of them even use it for
OpenStack. Some cookbook forks also exist, all serving to solve the same
problems that face OpenStack-Chef. Without feedback from real-world
deployments, OpenStack-Chef will continue to wither on the vine. This
fragmentation harms more than it solves.
What do we need? It’s easier to list what we don’t need. Developers, tooling,
documentation, testing, any and all are welcome and greatly appreciated. We
don’t need much, but what we are able to do is limited by our size and time
We need developers with time and funding budgeted for contributing within the
OpenStack ecosystem. We need better representation at events such as the PTG.
For the first PTG, no OpenStack-Chef members will be attending, though we
intend to meet up in Boston, maybe. Given our physical locations in proximity
to Atlanta, it made sense to stay home and communicate over IRC/code review. It
doesn’t mean our development has ceased, we’re just too few and far away for it
to make sense.
We also need help from cross-project teams to understand and assimilate the
work done to solve the problems that we are working to solve. We are all
working toward the same goal, though with a different dialect and coat of arms.
OpenStack needs choice in *how* to OpenStack. Limiting to a few dialects of
automation makes things look less like an ecosystem and more like a distro,
which is fine, if that’s how people want to go about it. We can talk about that,
I am happy to talk to anyone about how they can help. OpenStack-Chef has roots
in the pioneers of OpenStack, and, in my (not so humble) opinion, is way too
nifty to just let fall to the wayside.
I do not have team visuals to represent how the team has grown and shrank over
the years, so let me leave you with some mental visuals. At the end of Mitaka,
we numbered nine. At the start of Ocata, we numbered just two. In Boston, we
anticipate there will be, hopefully, all three of us, representing the sixteen
active subprojects that consist OpenStack-Chef.
: [openstack-compute::default commit history](https://github.com/openstack/cookbook-openstack-compute/commits/eol-grizzly/recipes/default.rb <https://github.com/openstack/cookbook-openstack-compute/commits/eol-grizzly/recipes/default.rb>)
: [April 2016 OpenStack User Survey: greater than 50% of production deployments were still on releases circa Kilo or older](https://www.openstack.org/assets/survey/April-2016-User-Survey-Report.pdf <https://www.openstack.org/assets/survey/April-2016-User-Survey-Report.pdf>)
: [OpenStack Survey Report](https://www.openstack.org/analytics <https://www.openstack.org/analytics>)
-------------- next part --------------
An HTML attachment was scrubbed...
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 801 bytes
Desc: Message signed with OpenPGP
More information about the OpenStack-dev