Open Stack

Wed Feb 15 05:37:10 UTC 2017

Ops who use Chef - FYI

-------- Forwarded Message --------
Subject: 	[openstack-dev] [chef] Making the Kitchen Great Again: A 
Retrospective on OpenStack & Chef
Date: 	Tue, 14 Feb 2017 20:44:30 -0800
From: 	Samuel Cassiba <s at cassiba.com>
Reply-To: 	OpenStack Development Mailing List (not for usage questions) 
<openstack-dev at lists.openstack.org>
To: 	openstack-dev <openstack-dev at lists.openstack.org>

The HTML version is here:
https://s.cassiba.com/2017/02/14/making-the-kitchen-great-again-a-retrospective-on-openstack-chef

This was influenced by Graham Hayes' State of the Project for Designate:
http://graham.hayes.ie/posts/openstack-designate-where-we-are/

I have been asked recently "what is going on with the OpenStack-Chef 
project?",
"how is the state of the cookbooks?", and "hey sc, how are those integration
tests coming?". Having been the PTL for the Newton and Ocata cycles, yet
having not shipped a release, is the unthinkable, and deserves at least a
sentence or two.

It goes without saying, this is disheartening and depressing to me and
everybody that has devoted their time to making the cookbooks a solid
and viable method for deploying OpenStack. OpenStack-Chef is among the
oldest[1] and most mature solutions for deploying OpenStack, though it is
not the most feature-rich.

*TL;DR* if you don't want to keep going -
OpenStack-Chef is not in a good place and is not sustainable.

OpenStack-Chef has always been a small project with a big responsibility.
The Chef approach to OpenStack historically has required a level of
investment within the Chef ecosystem, which is a hard enough sell when you
started out with Puppet or Ansible. Despite the unicorns and rainbows of
being Chef cookbooks, OpenStack-Chef always asserted itself as an OpenStack
project first, up to and including joining the Big Tent, whatever it takes.
To beat that drum, we are OpenStack.

There is no *cool* factor from deploying and managing OpenStack using Chef,
unless you've been running Chef, because insert Xzibit meme here and jokes
about turtles. Unless you break something with automation, then it's
applause or facepalm. Usually both. At the same time.

As with any kitchen, it must be stocked and well maintained, and
OpenStack-Chef is no exception. Starting out, there was a vibrant community
producing organic, free-range code. Automation is invisible, assumed to be
there in the background. Once it's in place, it isn't touched again unless
it breaks. Upgrades in complex deployments can be fraught with error, even
in an automated fashion.

As has been seen in previous surveys[2], once an OpenStack release has 
chosen
by an operator, some tend to not upgrade for the next cycle or three, to get
the immediate bugs worked out. Though there are now multinode and upgrade
scenarios supported with the Puppet OpenStack and TripleO projects, they do
not use Chef, so Chef deployers do not directly benefit from any of this
testing.

Being a deployment project, we are responsible for not one aspect of
the OpenStack project but as many as can be reasonably supported.

We were very fortunate in the beginning, having support from public cloud
providers, as well as large private cloud providers. Stackalytics shows a
vibrant history, a veritable who's-who of OpenStack contributors, too 
many to
name. They've all moved on, working on other things.

As a previous PTL for the project once joked, the Chef approach to OpenStack
was the "other deployment tool that nobody uses". As time has gone by, 
that has
become more of a true statement.

There are a few of us still cooking away, creating new recipes and 
cookbooks. The
pilot lights are still lit and there's usually something simmering away 
on the
back burner, but there is no shouting of orders, and not every dish gets 
tasted.
We think there might be rats, too, but we’re too shorthanded to maintain 
the traps.

We have yet to see many (meaningful) contributions from the community, 
however.
We have some amazing deployers that file bugs, and if they can, push up 
a patch.
It delights me when someone other than a core weighs in on a review. 
They are
highly appreciated and incredibly valuable, but they are very tactical
contributions. A project cannot live on such contributions.

October 2015

https://s.cassiba.com/images/oct-2015-deployment-decisions.png

Where does that leave OpenStack-Chef? Let's take a look at the numbers:

      +------------+------------+
      | Cycle      | Commits |
      +------------+------------+
      | Havana   | 557        |
      +------------+------------+
      | Icehouse | 692        |
      +------------+------------+
      | Juno       | 424         |
      +------------+------------+
      | Kilo         | 474         |
      +------------+------------+
      | Liberty    | 259         |
      +------------+------------+
      | Mitaka    | 85           |
      +------------+------------+
      | Newton   | 112         |
      +------------+------------+
      | Ocata      | 78          |
      +------------+------------+

As of the time of this writing, Newton has not yet branched. Yes, you read
correctly. This means the Ocata cycle has gone to ensuring that Newton *just
functions*. In a virtual quasi-vacuum, without input from larger scale
deployments, who are running releases older than Newton, reporting bugs 
we've
fixed in master. Supporting Newton required implementing support for Ubuntu
16.04, as well as client and underlying cookbook changes, due to 
deprecations
that started prior to Newton. Here is the output from *berks viz* for a 
top-down
view into the complexity on just the Chef side.

https://s.cassiba.com/images/openstack-chef-dependency-graph.png

For the Pike cycle, Jan Klare will be reprising the role of PTL. I do not
intend to speak for him, but there are few paths forward in the Big Tent:

* Branching stable/newton and stable/ocata with the quickness.
* Improve OpenStack CI to the point of being able to trust it again for
    testing patches, as well as extend testing scenarios (including 
multinode).

For branching stable/newton, the external CI has been proving useful in 
overall
confidence in cutting a release. We're way behind schedule, but nearly 
there. I
have begun working on implementing some basic multinode gates, as our 
allinone
no longer fits within the confines of the 8GB instances. But, it’s Chef, so
triangle wheels, yo. Some of the cross-project efforts translate to 
Chef, but
not all. With square spinners.

So... how did this happen?
-----------------------------------

As was in the case of Designate, as is in the case of OpenStack-Chef. 
There is
no one single reason or cause that arrived us at this point.

The main catalyst was internal support shifting, which impacted the 
sponsored
developers and contributors. OpenStack-Chef became less and less a priority,
and one by one they shifted to other focuses. At the Austin 2016 Summit, 
we said
farewell to all but the PTL and one core. This put OpenStack-Chef in a 
bad place
given its mission and scope, but onward we go.

Due to the volume of work done by this small group and the lack of feedback
during development, it became more and more difficult to tell when a release
could be considered "done". We could no longer trust our CI framework, 
as the
developers with intrinsic knowledge had been refocused, with little more 
than
commit history to go on.

Users were okay with leaving us work, which we added to the heap. This, with
the departure in contributors, resulted in the majority of the development
being funded by just two companies, which left the project at risk to 
changes
in direction by those companies. Without regular feedback or guidance beyond
release notes and the occasional chat in another project's channel, the 
focus
shifted away from features to just ship it, as long as it passes 
allinone and/or
multinode locally, if there’s time. Does it pass lint/unit/style? Fuck 
it. Ship
it, deal with the fallout. Yeah. This is bad on *so many* levels.

The Big Tent really did not do as much as advertised for OpenStack-Chef, 
as harsh
of an opinion as that sounds. Larger, more well-funded projects have 
since created
processes, frameworks and test suites that were developed for their own 
use cases,
not necessarily taking into account Chef's own blend of automation. That 
left
us having to go and discover how to make fire on our own to make the 
cookbooks
work on each supported platform and release of OpenStack. In the Big 
Tent, we
were effectively left to our own devices. Just another OpenStack project. We
numbered nine cores when we moved from StackForge to the Big Tent. Developer
peak, though we did not know it yet.

Initially, the cookbooks had a very heavy dependency: Chef Server. If 
not Chef
Server, Chef Solo, which still had its own quirks, and nobody liked Chef 
Solo
anyway. Not even Chef Solo liked itself. During the Juno cycle, we 
switched to
the Chef Development Kit, which gave us chef-provisioning. This decreased
turnaround time for testing patches being submitted, and boosted 
confidence all
around. Until Juno, it was difficult to run functional tests against the
cookbooks. That's when we discovered how to create fire. We could run
OpenStack! In virtual machines! On our laptops! OMyG you guys! Suddenly,
OpenStack on the laptop became easy, push button, single command, 
automated. We
could test a patch without a long spin-up. With that, came integration 
gates,
and periodic jobs. From days to minutes. We were cooking with gas! 
But... let’s
not make those integration gates voting... *yet*.

Mitaka brought a significant overhaul and simplification, with the 
introduction
of a multinode chef-provisioning recipe and more modular cookbooks. The 
pieces
finally existed, but the damage had been done, and unfortunately, this 
momentum
did not last. Internal priorities changed within companies sponsoring
developers, many of which could not be fully committed in the first 
place, and
we started shedding contributors, which happens. At this point is 
usually where
*someone* comes in to either play Grim Reaper or lifesaver. By Austin, our
numbers waned until just two cores remained, Jan and myself. We could 
not be in
the worst of locations to communicate, he in Germany and I in California.

In the Newton and Ocata cycles, development progressed in a lurching 
capacity
without a team. Due to the overhaul in Mitaka, patches slowed in 
frequency from
the outside community, many of who continued deploying and running on older,
EOL branches, or got frustrated enough to switch to other automation 
flavors.
The remaining team had little overlapping time to communicate, being on
different continents in conflicting time zones. What was difficult to do 
with a
larger team spread across three continents and five time zones became
impossible with just two. Day jobs increasingly took priority over
OpenStack-Chef care and feeding, and some cookbooks started to go rancid
(sorry, Ironic, Sahara, Swift and Trove. nobody was able to support a
deployment with you). Interaction within the development team was 
limited to an
hour or two a day, eventually down to once or twice a week if we had 
time. Day
jobs proceeded to consume the development team, with sporadic development as
the months ticked on.

Over the Newton cycle, one cookbook was offered, EC2, with inadequate 
coverage
for our support matrix. The most desired integration API. In the end, it was
not integrated due to time and commitment to support such a feature, having
inadequate resources at our disposal. During Ocata, the project had one
cookbook contributed from the community, Murano, that could be 
integrated, and
grew an appendage in the form of the client cookbook. It is the closest 
anyone
has gotten to new features since the Mitaka cycle. We added one core 
reviewer
during this time.

Communication is a big part of any project, particularly a geographically
diverse effort like OpenStack. Prior to the Big Tent, we held weekly 
meetings
using Hangouts, which were open and publicized for the mailing list 
subscribers
and channel denizens. Upon joining the Big Tent, we gave up the regular
face-time in favor of text-based IRC meetings, per governance. Without 
the high
bandwidth requirement of a video call once a week, one by one, cores had day
job meetings do what they do, and take priority. In the Newton cycle, we
relinquished our weekly time slot after it became apparent that neither 
of us
could make the meetings. We have not held a scheduled meeting since 
then, as it
is next to impossible to carve out adequate overlapping time.

We still have many of these problems to this day. Documentation is a mess or
nonexistent. Despite the flexibility of the tooling, users have but two
representative deployment examples: allinone or a rudimentary multinode.
OpenStack-Chef gained modularity at the expense of features, and there 
was an
overwhelming non-reaction to the deprecation of those features.

All of this results in a project that is not very friendly to new users, and
Chef does not look as attractive as a deployment option as other, more
feature-rich flavors. This has real business decisions behind it. One 
only need
look at the steady usage decline in the surveys to see how negatively things
appear to existing and new users. This, for a project that has roots in the
very cubicles OpenStack was born.[1]

But it's not all bad
--------------------

For all the negativity, this story has upsides. I call to the people who
actually use this software in their deployments, in whatever shape it's 
in, to
not abandon OpenStack-Chef, or retire it to bitrot. We need help, not 
funeral
arrangements. Share your pain, so that we may find a way forward 
together, not
alone.

October 2016

https://s.cassiba.com/images/oct-2016-deployment-decisions.png

In my time with the project, I’ve gotten to know that there are some 
pretty big
names that leverage Chef in their deployments, and some of them even use 
it for
OpenStack. Some cookbook forks also exist, all serving to solve the same
problems that face OpenStack-Chef. Without feedback from real-world
deployments, OpenStack-Chef will continue to wither on the vine. This
fragmentation harms more than it solves.

What do we need? It’s easier to list what we don’t need. Developers, 
tooling,
documentation, testing, any and all are welcome and greatly appreciated. We
don’t need much, but what we are able to do is limited by our size and time
available.

We need developers with time and funding budgeted for contributing 
within the
OpenStack ecosystem. We need better representation at events such as the 
PTG.
For the first PTG, no OpenStack-Chef members will be attending, though we
intend to meet up in Boston, maybe. Given our physical locations in 
proximity
to Atlanta, it made sense to stay home and communicate over IRC/code 
review. It
doesn’t mean our development has ceased, we’re just too few and far away 
for it
to make sense.

We also need help from cross-project teams to understand and assimilate the
work done to solve the problems that we are working to solve. We are all
working toward the same goal, though with a different dialect and coat 
of arms.

OpenStack needs choice in *how* to OpenStack. Limiting to a few dialects of
automation makes things look less like an ecosystem and more like a distro,
which is fine, if that’s how people want to go about it. We can talk 
about that,
too.

I am happy to talk to anyone about how they can help. OpenStack-Chef has 
roots
in the pioneers of OpenStack, and, in my (not so humble) opinion, is way too
nifty to just let fall to the wayside.

I do not have team visuals to represent how the team has grown and 
shrank over
the years, so let me leave you with some mental visuals. At the end of 
Mitaka,
we numbered nine. At the start of Ocata, we numbered just two. In Boston, we
anticipate there will be, hopefully, all three of us, representing the 
sixteen
active subprojects that consist OpenStack-Chef.

[1]: [openstack-compute::default commit 
history](https://github.com/openstack/cookbook-openstack-compute/commits/eol-grizzly/recipes/default.rb)
[2]: [April 2016 OpenStack User Survey: greater than 50% of production 
deployments were still on releases circa Kilo or 
older](https://www.openstack.org/assets/survey/April-2016-User-Survey-Report.pdf)
[3]: [OpenStack Survey Report](https://www.openstack.org/analytics)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 802 bytes
Desc: not available
URL: <http://lists.openstack.org/pipermail/openstack-operators/attachments/20170215/09e59e57/attachment.pgp>
-------------- next part --------------
__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Open Stack

[Openstack-operators] Fwd: [openstack-dev] [chef] Making the Kitchen Great Again: A Retrospective on OpenStack & Chef

OpenStack

Community

Documentation

Branding & Legal