[TripleO][Edge] Reduce base layer of containers for security and size of images (maintenance) sakes

Bogdan Dobrelya bdobreli at redhat.com
Tue Nov 27 15:24:44 UTC 2018


Changing the topic to follow the subject.

[tl;dr] it's time to rearchitect container images to stop incluiding 
config-time only (puppet et al) bits, which are not needed runtime and 
pose security issues, like CVEs, to maintain daily.

Background:
1) For the Distributed Compute Node edge case, there is potentially tens 
of thousands of a single-compute-node remote edge sites connected over 
WAN to a single control plane, which is having high latency, like a 
100ms or so, and limited bandwith. Reducing the base layer size becomes 
a decent goal there. See the security background below.
2) For a generic security (Day 2, maintenance) case, when 
puppet/ruby/systemd/name-it gets a CVE fixed, the base layer has to be 
updated and all layers on top - to be rebuild, and all of those layers, 
to be re-fetched for cloud hosts and all containers to be restarted... 
And all of that because of some fixes that have nothing to OpenStack. By 
the remote edge sites as well, remember of "tens of thousands", high 
latency and limited bandwith?..
3) TripleO CI updates (including puppet*) packages in containers, not in 
a common base layer of those. So each a CI job has to update puppet* and 
its dependencies - ruby/systemd as well. Reducing numbers of packages to 
update for each container makes sense for CI as well.

Implementation related:

WIP patches [0],[1] for early review, uses a config "pod" approach, does 
not require to maintain a two sets of config vs runtime images. Future 
work: a) cronie requires systemd, we'd want to fix that also off the 
base layer. b) rework to podman pods for docker-puppet.py instead of 
--volumes-from a side car container (can't be backported for Queens 
then, which is still nice to have a support for the Edge DCN case, at 
least downstream only perhaps).

Some questions raised on IRC:

Q: is having a service be able to configure itself really need to 
involve a separate pod?
A: Highly likely yes, removing not-runtime things is a good idea and 
pods is an established PaaS paradigm already. That will require some 
changes in the architecture though (see the topic with WIP patches).

Q: that's (fetching a config container) actually more data that about to 
  download otherwise
A: It's not, if thinking of Day 2, when have to re-fetch the base layer 
and top layers, when some unrelated to openstack CVEs got fixed there 
for ruby/puppet/systemd. Avoid the need to restart service containers 
because of those minor updates puched is also a nice thing.

Q: the best solution here would be using packages on the host, 
generating the config files on the host. And then having an all-in-one 
container for all the services which lets them run in an isolated mannner.
A: I think for Edge cases, that's a no go as we might want to consider 
tiny low footprint OS distros like former known Container Linux or 
Atomic. Also, an all-in-one container looks like an anti-pattern from 
the world of VMs.

[0] https://review.openstack.org/#/q/topic:base-container-reduction
[1] https://review.rdoproject.org/r/#/q/topic:base-container-reduction

> Here is a related bug [1] and implementation [1] for that. PTAL folks!
> 
> [0] https://bugs.launchpad.net/tripleo/+bug/1804822
> [1] https://review.openstack.org/#/q/topic:base-container-reduction
> 
>> Let's also think of removing puppet-tripleo from the base container.
>> It really brings the world-in (and yum updates in CI!) each job and each 
>> container!
>> So if we did so, we should then either install puppet-tripleo and co on 
>> the host and bind-mount it for the docker-puppet deployment task steps 
>> (bad idea IMO), OR use the magical --volumes-from <a-side-car-container> 
>> option to mount volumes from some "puppet-config" sidecar container 
>> inside each of the containers being launched by docker-puppet tooling.
> 
> On Wed, Oct 31, 2018 at 11:16 AM Harald Jensås <hjensas at redhat.com> 
> wrote:
>> We add this to all images:
>> 
>> https://github.com/openstack/tripleo-common/blob/d35af75b0d8c4683a677660646e535cf972c98ef/container-images/tripleo_kolla_template_overrides.j2#L35
>> 
>> /bin/sh -c yum -y install iproute iscsi-initiator-utils lvm2 python
>> socat sudo which openstack-tripleo-common-container-base rsync cronie
>> crudini openstack-selinux ansible python-shade puppet-tripleo python2-
>> kubernetes && yum clean all && rm -rf /var/cache/yum 276 MB 
>> 
>> Is the additional 276 MB reasonable here?
>> openstack-selinux <- This package run relabling, does that kind of
>> touching the filesystem impact the size due to docker layers?
>> 
>> Also: python2-kubernetes is a fairly large package (18007990) do we use
>> that in every image? I don't see any tripleo related repos importing
>> from that when searching on Hound? The original commit message[1]
>> adding it states it is for future convenience.
>> 
>> On my undercloud we have 101 images, if we are downloading every 18 MB
>> per image thats almost 1.8 GB for a package we don't use? (I hope it's
>> not like this? With docker layers, we only download that 276 MB
>> transaction once? Or?)
>> 
>> 
>> [1] https://review.openstack.org/527927
> 
> 
> 
> -- 
> Best regards,
> Bogdan Dobrelya,
> Irc #bogdando


-- 
Best regards,
Bogdan Dobrelya,
Irc #bogdando



More information about the openstack-discuss mailing list