[openstack-dev] [tc][infra][release][security][stable][kolla][loci][tripleo][docker][kubernetes] do we want to be publishing binary container images?
doug at doughellmann.com
Wed May 17 18:04:53 UTC 2017
Excerpts from Michał Jastrzębski's message of 2017-05-17 07:47:31 -0700:
> On 17 May 2017 at 04:14, Chris Dent <cdent+os at anticdent.org> wrote:
> > On Wed, 17 May 2017, Thierry Carrez wrote:
> >> Back to container image world, if we refresh those images daily and they
> >> are not versioned or archived (basically you can only use the latest and
> >> can't really access past dailies), I think we'd be in a similar situation
> >> ?
> > Yes, this.
> I think it's not a bad idea to message "you are responsible for
> archving your containers". Do that, combine it with good toolset that
> helps users determine versions of packages and other metadata and
> we'll end up with something that itself would be greatly appreciated.
> Few potential user stories.
> I have OpenStack <100 nodes and need every single one of them, hence
> no CI. At the same time I want to have fresh packages to avoid CVEs. I
> deploy kolla with tip-of-the-stable-branch and setup cronjob that will
> upgrade it every week. Because my scenerio is quite typical and
> containers already ran through gates that tests my scenerio, I'm good.
> Another one:
> I have 300+ node cloud, heavy CI and security team examining every
> container. While I could build containers locally, downloading them is
> just simpler and effectively the same (after all, it's containers
> being tested not build process). Every download our security team
> scrutinize contaniers and uses toolset Kolla provides to help them.
> Additional benefit is that on top of our CI these images went through
> Kolla CI which is nice, more testing is always good.
> And another one
> We are Kolla community. We want to provide testing for full release
> upgrades every day in gates, to make sure OpenStack and Kolla is
> upgradable and improve general user experience of upgrades. Because
> infra is resource constrained, we cannot afford building 2 sets of
> containers (stable and master) and doing deploy->test->upgrade->test.
> However because we have these cached containers, that are fresh and
> passed CI for deploy, we can just use them! Now effectively we're not
> only testing Kolla's correctness of upgrade procedure but also all the
> other project team upgrades! Oh, it seems Nova merged something that
> negatively affects upgrades, let's make sure they are aware!
> And last one, which cannot be underestimated
> I am CTO of some company and I've heard OpenStack is no longer hard to
> deploy, I'll just download kolla-ansible and try. I'll follow this
> guide that deploys simple OpenStack with 2 commands and few small
> configs, and it's done! Super simple! We're moving to OpenStack and
> start contributing tomorrow!
> Please, let's solve messaging problems, put burden of archiving on
> users, whatever it takes to protect our community from wrong
> expectations, but not kill this effort. There are very real and
> immediate benefits to OpenStack as a whole if we do this.
You've presented some positive scenarios. Here's a worst case
situation that I'm worried about.
Suppose in a few months the top several companies contributing to
kolla decide to pull out of or reduce their contributions to
OpenStack. IBM, Intel, Oracle, and Cisco either lay folks off or
redirect their efforts to other projects. Maybe they start
contributing directly to kubernetes. The kolla team is hit badly,
and all of the people from that team who know how the container
publishing jobs work are gone.
The day after everyone says goodbye, the build breaks. Maybe a bad
patch lands, or maybe some upstream assumption changes. The issue
isn't with the infra jobs themselves. The break means no new container
images are being published. Since there's not much of a kolla team
any more, it looks like it will be a while before anyone has time
to figure out how to fix the problem.
Later that same day, a new zero-day exploit is announced in a
component included in all or most of those images. Something that
isn't developed in the community, such as OpenSSL or glibc. The
exploit allows a complete breach of any app running with it. All
existing published containers include the bad bits and need to be
We now have an unknown number of clouds running containers built
by the community with major security holes. The team responsible
for maintaining those images is a shambles, but even if they weren't
the automation isn't working, so no new images can be published.
The consumers of the existing containers haven't bothered to set
up build pipelines of their own, because why bother? Even though
we've clearly said the images "we" publish are for our own testing,
they have found it irresistibly convenient to use them and move on
with their lives.
When the exploit is announced, they start clamoring for new container
images, and become understandably irate when we say we didn't think
they would be using them in production and they *shouldn't have*
and their problems are not our problems because we told them not
to do that. Some of them point to this mailing list thread, and the
promises made. When we tell them those images were really being
built by the kolla team and that they're gone and none of the rest
of us know how to build new images or fix the problem with the build
system, panic ensues. The community gets a bad reputation for
overreaching and not supporting what "we" produce.
Contrast that with a scenario in which consumers either take
responsibility for their systems by building their own images, by
collaborating directly with other consumers to share the resources
needed to build those images, or by paying a third-party a sustainable
amount of money to build images for them. In any of those cases,
there is an incentive for the responsible party to be ready and
able to produce new images in a timely manner. Consumers of the
images know exactly where to go for support when they have problems.
Issues in those images don't reflect on the community in any way,
because we were not involved in producing them.
As I said at the start of this thread, we've long avoided building
and supporting simple operating system style packages of the
components we produce. I am still struggling to understand how
building more complex artifacts, including bits over which we have
little or no control, is somehow more sustainable than those simple
More information about the OpenStack-dev