On 11/28/18 2:58 PM, Dan Prince wrote:
On Wed, 2018-11-28 at 12:45 +0100, Bogdan Dobrelya wrote:
To follow up and explain the patches for code review:
The "header" patch https://review.openstack.org/620310 -> (requires) https://review.rdoproject.org/r/#/c/17534/, and also https://review.openstack.org/620061 -> (which in turn requires) https://review.openstack.org/619744 -> (Kolla change, the 1st to go) https://review.openstack.org/619736
This email was cross-posted to multiple lists and I think we may have lost some of the context in the process as the subject was changed.
Most of the suggestions and patches are about making our base container(s) smaller in size. And the means by which the patches do that is to share binaries/applications across containers with custom mounts/volumes. I've -2'd most of them. What concerns me however is that some of the TripleO cores seemed open to this idea yesterday on IRC. Perhaps I've misread things but what you appear to be doing here is quite drastic I think we need to consider any of this carefully before proceeding with any of it.
Please also read the commit messages, I tried to explain all "Whys" very carefully. Just to sum up it here as well:
The current self-containing (config and runtime bits) architecture of containers badly affects:
* the size of the base layer and all containers images as an additional 300MB (adds an extra 30% of size).
You are accomplishing this by removing Puppet from the base container, but you are also creating another container in the process. This would still be required on all nodes as Puppet is our config tool. So you would still be downloading some of this data anyways. Understood your reasons for doing this are that it avoids rebuilding all containers when there is a change to any of these packages in the base container. What you are missing however is how often is it the case that Puppet is updated that something else in the base container isn't?
For CI jobs updating all containers, its quite an often to have changes in openstack/tripleo puppet modules to pull in. IIUC, that automatically picks up any updates for all of its dependencies and for the dependencies of dependencies, and all that multiplied by a hundred of total containers to get it updated. That is a *pain* we're used to have these day for quite often timing out CI jobs... Ofc, the main cause is delayed promotions though. For real deployments, I have no data for the cadence of minor updates in puppet and tripleo & openstack modules for it, let's ask operators (as we're happened to be in the merged openstack-discuss list)? For its dependencies though, like systemd and ruby, I'm pretty sure it's quite often to have CVEs fixed there. So I expect what "in the fields" security fixes delivering for those might bring some unwanted hassle for long-term maintenance of LTS releases. As Tengu noted on IRC: "well, between systemd, puppet and ruby, there are many security concernes, almost every month... and also, what's the point keeping them in runtime containers when they are useless?"
I would wager that it is more rare than you'd think. Perhaps looking at the history of an OpenStack distribution would be a valid way to assess this more critically. Without this data to backup the numbers I'm afraid what you are doing here falls into "pre-optimization" territory for me and I don't think the means used in the patches warrent the benefits you mention here.
* Edge cases, where we have containers images to be distributed, at least once to hit local registries, over high-latency and limited bandwith, highly unreliable WAN connections. * numbers of packages to update in CI for all containers for all services (CI jobs do not rebuild containers so each container gets updated for those 300MB of extra size).
It would seem to me there are other ways to solve the CI containers update problems. Rebuilding the base layer more often would solve this right? If we always build our service containers off of a base layer that is recent there should be no updates to the system/puppet packages there in our CI pipelines.
* security and the surface of attacks, by introducing systemd et al as additional subjects for CVE fixes to maintain for all containers.
We aren't actually using systemd within our containers. I think those packages are getting pulled in by an RPM dependency elsewhere. So rather than using 'rpm -ev --nodeps' to remove it we could create a sub-package for containers in those cases and install it instead. In short rather than hack this to remove them why not pursue a proper packaging fix?
In general I am a fan of getting things out of the base container we don't need... so yeah lets do this. But lets do it properly.
* services uptime, by additional restarts of services related to security maintanence of irrelevant to openstack components sitting as a dead weight in containers images for ever.
Like I said above how often is it that these packages actually change where something else in the base container doesn't? Perhaps we should get more data here before blindly implementing a solution we aren't sure really helps out in the real world.
On 11/27/18 4:08 PM, Bogdan Dobrelya wrote:
Changing the topic to follow the subject.
[tl;dr] it's time to rearchitect container images to stop incluiding config-time only (puppet et al) bits, which are not needed runtime and pose security issues, like CVEs, to maintain daily.
Background: 1) For the Distributed Compute Node edge case, there is potentially tens of thousands of a single-compute-node remote edge sites connected over WAN to a single control plane, which is having high latency, like a 100ms or so, and limited bandwith. 2) For a generic security case, 3) TripleO CI updates all
Challenge:
Here is a related bug [1] and implementation [1] for that. PTAL folks!
[0] https://bugs.launchpad.net/tripleo/+bug/1804822 [1] https://review.openstack.org/#/q/topic:base-container-reduction
Let's also think of removing puppet-tripleo from the base container. It really brings the world-in (and yum updates in CI!) each job and each container! So if we did so, we should then either install puppet-tripleo and co on the host and bind-mount it for the docker-puppet deployment task steps (bad idea IMO), OR use the magical --volumes-from <a-side-car-container> option to mount volumes from some "puppet-config" sidecar container inside each of the containers being launched by docker-puppet tooling.
On Wed, Oct 31, 2018 at 11:16 AM Harald Jensås <hjensas at redhat.com> wrote:
We add this to all images:
https://github.com/openstack/tripleo-common/blob/d35af75b0d8c4683a677660646e...
/bin/sh -c yum -y install iproute iscsi-initiator-utils lvm2 python socat sudo which openstack-tripleo-common-container-base rsync cronie crudini openstack-selinux ansible python-shade puppet-tripleo python2- kubernetes && yum clean all && rm -rf /var/cache/yum 276 MB Is the additional 276 MB reasonable here? openstack-selinux <- This package run relabling, does that kind of touching the filesystem impact the size due to docker layers?
Also: python2-kubernetes is a fairly large package (18007990) do we use that in every image? I don't see any tripleo related repos importing from that when searching on Hound? The original commit message[1] adding it states it is for future convenience.
On my undercloud we have 101 images, if we are downloading every 18 MB per image thats almost 1.8 GB for a package we don't use? (I hope it's not like this? With docker layers, we only download that 276 MB transaction once? Or?)
-- Best regards, Bogdan Dobrelya, Irc #bogdando
-- Best regards, Bogdan Dobrelya, Irc #bogdando