Re: [TripleO][Edge] Reduce base layer of containers for security and size of images (maintenance) sakes
To follow up and explain the patches for code review: The "header" patch https://review.openstack.org/620310 -> (requires) https://review.rdoproject.org/r/#/c/17534/, and also https://review.openstack.org/620061 -> (which in turn requires) https://review.openstack.org/619744 -> (Kolla change, the 1st to go) https://review.openstack.org/619736 Please also read the commit messages, I tried to explain all "Whys" very carefully. Just to sum up it here as well: The current self-containing (config and runtime bits) architecture of containers badly affects: * the size of the base layer and all containers images as an additional 300MB (adds an extra 30% of size). * Edge cases, where we have containers images to be distributed, at least once to hit local registries, over high-latency and limited bandwith, highly unreliable WAN connections. * numbers of packages to update in CI for all containers for all services (CI jobs do not rebuild containers so each container gets updated for those 300MB of extra size). * security and the surface of attacks, by introducing systemd et al as additional subjects for CVE fixes to maintain for all containers. * services uptime, by additional restarts of services related to security maintanence of irrelevant to openstack components sitting as a dead weight in containers images for ever. On 11/27/18 4:08 PM, Bogdan Dobrelya wrote:
Changing the topic to follow the subject.
[tl;dr] it's time to rearchitect container images to stop incluiding config-time only (puppet et al) bits, which are not needed runtime and pose security issues, like CVEs, to maintain daily.
Background: 1) For the Distributed Compute Node edge case, there is potentially tens of thousands of a single-compute-node remote edge sites connected over WAN to a single control plane, which is having high latency, like a 100ms or so, and limited bandwith. 2) For a generic security case, 3) TripleO CI updates all
Challenge:
Here is a related bug [1] and implementation [1] for that. PTAL folks!
[0] https://bugs.launchpad.net/tripleo/+bug/1804822 [1] https://review.openstack.org/#/q/topic:base-container-reduction
Let's also think of removing puppet-tripleo from the base container. It really brings the world-in (and yum updates in CI!) each job and each container! So if we did so, we should then either install puppet-tripleo and co on the host and bind-mount it for the docker-puppet deployment task steps (bad idea IMO), OR use the magical --volumes-from <a-side-car-container> option to mount volumes from some "puppet-config" sidecar container inside each of the containers being launched by docker-puppet tooling.
On Wed, Oct 31, 2018 at 11:16 AM Harald Jensås <hjensas at redhat.com> wrote:
We add this to all images:
https://github.com/openstack/tripleo-common/blob/d35af75b0d8c4683a677660646e...
/bin/sh -c yum -y install iproute iscsi-initiator-utils lvm2 python socat sudo which openstack-tripleo-common-container-base rsync cronie crudini openstack-selinux ansible python-shade puppet-tripleo python2- kubernetes && yum clean all && rm -rf /var/cache/yum 276 MB Is the additional 276 MB reasonable here? openstack-selinux <- This package run relabling, does that kind of touching the filesystem impact the size due to docker layers?
Also: python2-kubernetes is a fairly large package (18007990) do we use that in every image? I don't see any tripleo related repos importing from that when searching on Hound? The original commit message[1] adding it states it is for future convenience.
On my undercloud we have 101 images, if we are downloading every 18 MB per image thats almost 1.8 GB for a package we don't use? (I hope it's not like this? With docker layers, we only download that 276 MB transaction once? Or?)
-- Best regards, Bogdan Dobrelya, Irc #bogdando
-- Best regards, Bogdan Dobrelya, Irc #bogdando
Added Kolla tag as we all together might want to do something to that systemd included in containers via *multiple* package dependencies, like [0]. Ideally, that might be properly packaging all/some (like those names listed in [1]) of the places having it as a dependency, to stop doing that as of now it's Containers Time?.. As a temporary security band-aiding I was thinking of removing systemd via footers [1] as an extra layer added on top, but not sure that buys something good long-term. [0] https://pastebin.com/RSaRsYgZ [1] https://review.openstack.org/#/c/620310/2/container-images/tripleo_kolla_tem... On 11/28/18 12:45 PM, Bogdan Dobrelya wrote:
To follow up and explain the patches for code review:
The "header" patch https://review.openstack.org/620310 -> (requires) https://review.rdoproject.org/r/#/c/17534/, and also https://review.openstack.org/620061 -> (which in turn requires) https://review.openstack.org/619744 -> (Kolla change, the 1st to go) https://review.openstack.org/619736
Please also read the commit messages, I tried to explain all "Whys" very carefully. Just to sum up it here as well:
The current self-containing (config and runtime bits) architecture of containers badly affects:
* the size of the base layer and all containers images as an additional 300MB (adds an extra 30% of size). * Edge cases, where we have containers images to be distributed, at least once to hit local registries, over high-latency and limited bandwith, highly unreliable WAN connections. * numbers of packages to update in CI for all containers for all services (CI jobs do not rebuild containers so each container gets updated for those 300MB of extra size). * security and the surface of attacks, by introducing systemd et al as additional subjects for CVE fixes to maintain for all containers. * services uptime, by additional restarts of services related to security maintanence of irrelevant to openstack components sitting as a dead weight in containers images for ever.
On 11/27/18 4:08 PM, Bogdan Dobrelya wrote:
Changing the topic to follow the subject.
[tl;dr] it's time to rearchitect container images to stop incluiding config-time only (puppet et al) bits, which are not needed runtime and pose security issues, like CVEs, to maintain daily.
Background: 1) For the Distributed Compute Node edge case, there is potentially tens of thousands of a single-compute-node remote edge sites connected over WAN to a single control plane, which is having high latency, like a 100ms or so, and limited bandwith. 2) For a generic security case, 3) TripleO CI updates all
Challenge:
Here is a related bug [1] and implementation [1] for that. PTAL folks!
[0] https://bugs.launchpad.net/tripleo/+bug/1804822 [1] https://review.openstack.org/#/q/topic:base-container-reduction
Let's also think of removing puppet-tripleo from the base container. It really brings the world-in (and yum updates in CI!) each job and each container! So if we did so, we should then either install puppet-tripleo and co on the host and bind-mount it for the docker-puppet deployment task steps (bad idea IMO), OR use the magical --volumes-from <a-side-car-container> option to mount volumes from some "puppet-config" sidecar container inside each of the containers being launched by docker-puppet tooling.
On Wed, Oct 31, 2018 at 11:16 AM Harald Jensås <hjensas at redhat.com> wrote:
We add this to all images:
https://github.com/openstack/tripleo-common/blob/d35af75b0d8c4683a677660646e...
/bin/sh -c yum -y install iproute iscsi-initiator-utils lvm2 python socat sudo which openstack-tripleo-common-container-base rsync cronie crudini openstack-selinux ansible python-shade puppet-tripleo python2- kubernetes && yum clean all && rm -rf /var/cache/yum 276 MB Is the additional 276 MB reasonable here? openstack-selinux <- This package run relabling, does that kind of touching the filesystem impact the size due to docker layers?
Also: python2-kubernetes is a fairly large package (18007990) do we use that in every image? I don't see any tripleo related repos importing from that when searching on Hound? The original commit message[1] adding it states it is for future convenience.
On my undercloud we have 101 images, if we are downloading every 18 MB per image thats almost 1.8 GB for a package we don't use? (I hope it's not like this? With docker layers, we only download that 276 MB transaction once? Or?)
-- Best regards, Bogdan Dobrelya, Irc #bogdando
-- Best regards, Bogdan Dobrelya, Irc #bogdando
On Wed, 2018-11-28 at 12:45 +0100, Bogdan Dobrelya wrote:
To follow up and explain the patches for code review:
The "header" patch https://review.openstack.org/620310 -> (requires) https://review.rdoproject.org/r/#/c/17534/, and also https://review.openstack.org/620061 -> (which in turn requires) https://review.openstack.org/619744 -> (Kolla change, the 1st to go) https://review.openstack.org/619736
This email was cross-posted to multiple lists and I think we may have lost some of the context in the process as the subject was changed. Most of the suggestions and patches are about making our base container(s) smaller in size. And the means by which the patches do that is to share binaries/applications across containers with custom mounts/volumes. I've -2'd most of them. What concerns me however is that some of the TripleO cores seemed open to this idea yesterday on IRC. Perhaps I've misread things but what you appear to be doing here is quite drastic I think we need to consider any of this carefully before proceeding with any of it.
Please also read the commit messages, I tried to explain all "Whys" very carefully. Just to sum up it here as well:
The current self-containing (config and runtime bits) architecture of containers badly affects:
* the size of the base layer and all containers images as an additional 300MB (adds an extra 30% of size).
You are accomplishing this by removing Puppet from the base container, but you are also creating another container in the process. This would still be required on all nodes as Puppet is our config tool. So you would still be downloading some of this data anyways. Understood your reasons for doing this are that it avoids rebuilding all containers when there is a change to any of these packages in the base container. What you are missing however is how often is it the case that Puppet is updated that something else in the base container isn't? I would wager that it is more rare than you'd think. Perhaps looking at the history of an OpenStack distribution would be a valid way to assess this more critically. Without this data to backup the numbers I'm afraid what you are doing here falls into "pre-optimization" territory for me and I don't think the means used in the patches warrent the benefits you mention here.
* Edge cases, where we have containers images to be distributed, at least once to hit local registries, over high-latency and limited bandwith, highly unreliable WAN connections. * numbers of packages to update in CI for all containers for all services (CI jobs do not rebuild containers so each container gets updated for those 300MB of extra size).
It would seem to me there are other ways to solve the CI containers update problems. Rebuilding the base layer more often would solve this right? If we always build our service containers off of a base layer that is recent there should be no updates to the system/puppet packages there in our CI pipelines.
* security and the surface of attacks, by introducing systemd et al as additional subjects for CVE fixes to maintain for all containers.
We aren't actually using systemd within our containers. I think those packages are getting pulled in by an RPM dependency elsewhere. So rather than using 'rpm -ev --nodeps' to remove it we could create a sub-package for containers in those cases and install it instead. In short rather than hack this to remove them why not pursue a proper packaging fix? In general I am a fan of getting things out of the base container we don't need... so yeah lets do this. But lets do it properly.
* services uptime, by additional restarts of services related to security maintanence of irrelevant to openstack components sitting as a dead weight in containers images for ever.
Like I said above how often is it that these packages actually change where something else in the base container doesn't? Perhaps we should get more data here before blindly implementing a solution we aren't sure really helps out in the real world.
On 11/27/18 4:08 PM, Bogdan Dobrelya wrote:
Changing the topic to follow the subject.
[tl;dr] it's time to rearchitect container images to stop incluiding config-time only (puppet et al) bits, which are not needed runtime and pose security issues, like CVEs, to maintain daily.
Background: 1) For the Distributed Compute Node edge case, there is potentially tens of thousands of a single-compute-node remote edge sites connected over WAN to a single control plane, which is having high latency, like a 100ms or so, and limited bandwith. 2) For a generic security case, 3) TripleO CI updates all
Challenge:
Here is a related bug [1] and implementation [1] for that. PTAL folks!
[0] https://bugs.launchpad.net/tripleo/+bug/1804822 [1] https://review.openstack.org/#/q/topic:base-container-reduction
Let's also think of removing puppet-tripleo from the base container. It really brings the world-in (and yum updates in CI!) each job and each container! So if we did so, we should then either install puppet-tripleo and co on the host and bind-mount it for the docker-puppet deployment task steps (bad idea IMO), OR use the magical --volumes-from <a-side-car-container> option to mount volumes from some "puppet-config" sidecar container inside each of the containers being launched by docker-puppet tooling.
On Wed, Oct 31, 2018 at 11:16 AM Harald Jensås <hjensas at redhat.com> wrote:
We add this to all images:
https://github.com/openstack/tripleo-common/blob/d35af75b0d8c4683a677660646e...
/bin/sh -c yum -y install iproute iscsi-initiator-utils lvm2 python socat sudo which openstack-tripleo-common-container-base rsync cronie crudini openstack-selinux ansible python-shade puppet-tripleo python2- kubernetes && yum clean all && rm -rf /var/cache/yum 276 MB Is the additional 276 MB reasonable here? openstack-selinux <- This package run relabling, does that kind of touching the filesystem impact the size due to docker layers?
Also: python2-kubernetes is a fairly large package (18007990) do we use that in every image? I don't see any tripleo related repos importing from that when searching on Hound? The original commit message[1] adding it states it is for future convenience.
On my undercloud we have 101 images, if we are downloading every 18 MB per image thats almost 1.8 GB for a package we don't use? (I hope it's not like this? With docker layers, we only download that 276 MB transaction once? Or?)
-- Best regards, Bogdan Dobrelya, Irc #bogdando
On 11/28/18 2:58 PM, Dan Prince wrote:
On Wed, 2018-11-28 at 12:45 +0100, Bogdan Dobrelya wrote:
To follow up and explain the patches for code review:
The "header" patch https://review.openstack.org/620310 -> (requires) https://review.rdoproject.org/r/#/c/17534/, and also https://review.openstack.org/620061 -> (which in turn requires) https://review.openstack.org/619744 -> (Kolla change, the 1st to go) https://review.openstack.org/619736
This email was cross-posted to multiple lists and I think we may have lost some of the context in the process as the subject was changed.
Most of the suggestions and patches are about making our base container(s) smaller in size. And the means by which the patches do that is to share binaries/applications across containers with custom mounts/volumes. I've -2'd most of them. What concerns me however is that some of the TripleO cores seemed open to this idea yesterday on IRC. Perhaps I've misread things but what you appear to be doing here is quite drastic I think we need to consider any of this carefully before proceeding with any of it.
Please also read the commit messages, I tried to explain all "Whys" very carefully. Just to sum up it here as well:
The current self-containing (config and runtime bits) architecture of containers badly affects:
* the size of the base layer and all containers images as an additional 300MB (adds an extra 30% of size).
You are accomplishing this by removing Puppet from the base container, but you are also creating another container in the process. This would still be required on all nodes as Puppet is our config tool. So you would still be downloading some of this data anyways. Understood your reasons for doing this are that it avoids rebuilding all containers when there is a change to any of these packages in the base container. What you are missing however is how often is it the case that Puppet is updated that something else in the base container isn't?
For CI jobs updating all containers, its quite an often to have changes in openstack/tripleo puppet modules to pull in. IIUC, that automatically picks up any updates for all of its dependencies and for the dependencies of dependencies, and all that multiplied by a hundred of total containers to get it updated. That is a *pain* we're used to have these day for quite often timing out CI jobs... Ofc, the main cause is delayed promotions though. For real deployments, I have no data for the cadence of minor updates in puppet and tripleo & openstack modules for it, let's ask operators (as we're happened to be in the merged openstack-discuss list)? For its dependencies though, like systemd and ruby, I'm pretty sure it's quite often to have CVEs fixed there. So I expect what "in the fields" security fixes delivering for those might bring some unwanted hassle for long-term maintenance of LTS releases. As Tengu noted on IRC: "well, between systemd, puppet and ruby, there are many security concernes, almost every month... and also, what's the point keeping them in runtime containers when they are useless?"
I would wager that it is more rare than you'd think. Perhaps looking at the history of an OpenStack distribution would be a valid way to assess this more critically. Without this data to backup the numbers I'm afraid what you are doing here falls into "pre-optimization" territory for me and I don't think the means used in the patches warrent the benefits you mention here.
* Edge cases, where we have containers images to be distributed, at least once to hit local registries, over high-latency and limited bandwith, highly unreliable WAN connections. * numbers of packages to update in CI for all containers for all services (CI jobs do not rebuild containers so each container gets updated for those 300MB of extra size).
It would seem to me there are other ways to solve the CI containers update problems. Rebuilding the base layer more often would solve this right? If we always build our service containers off of a base layer that is recent there should be no updates to the system/puppet packages there in our CI pipelines.
* security and the surface of attacks, by introducing systemd et al as additional subjects for CVE fixes to maintain for all containers.
We aren't actually using systemd within our containers. I think those packages are getting pulled in by an RPM dependency elsewhere. So rather than using 'rpm -ev --nodeps' to remove it we could create a sub-package for containers in those cases and install it instead. In short rather than hack this to remove them why not pursue a proper packaging fix?
In general I am a fan of getting things out of the base container we don't need... so yeah lets do this. But lets do it properly.
* services uptime, by additional restarts of services related to security maintanence of irrelevant to openstack components sitting as a dead weight in containers images for ever.
Like I said above how often is it that these packages actually change where something else in the base container doesn't? Perhaps we should get more data here before blindly implementing a solution we aren't sure really helps out in the real world.
On 11/27/18 4:08 PM, Bogdan Dobrelya wrote:
Changing the topic to follow the subject.
[tl;dr] it's time to rearchitect container images to stop incluiding config-time only (puppet et al) bits, which are not needed runtime and pose security issues, like CVEs, to maintain daily.
Background: 1) For the Distributed Compute Node edge case, there is potentially tens of thousands of a single-compute-node remote edge sites connected over WAN to a single control plane, which is having high latency, like a 100ms or so, and limited bandwith. 2) For a generic security case, 3) TripleO CI updates all
Challenge:
Here is a related bug [1] and implementation [1] for that. PTAL folks!
[0] https://bugs.launchpad.net/tripleo/+bug/1804822 [1] https://review.openstack.org/#/q/topic:base-container-reduction
Let's also think of removing puppet-tripleo from the base container. It really brings the world-in (and yum updates in CI!) each job and each container! So if we did so, we should then either install puppet-tripleo and co on the host and bind-mount it for the docker-puppet deployment task steps (bad idea IMO), OR use the magical --volumes-from <a-side-car-container> option to mount volumes from some "puppet-config" sidecar container inside each of the containers being launched by docker-puppet tooling.
On Wed, Oct 31, 2018 at 11:16 AM Harald Jensås <hjensas at redhat.com> wrote:
We add this to all images:
https://github.com/openstack/tripleo-common/blob/d35af75b0d8c4683a677660646e...
/bin/sh -c yum -y install iproute iscsi-initiator-utils lvm2 python socat sudo which openstack-tripleo-common-container-base rsync cronie crudini openstack-selinux ansible python-shade puppet-tripleo python2- kubernetes && yum clean all && rm -rf /var/cache/yum 276 MB Is the additional 276 MB reasonable here? openstack-selinux <- This package run relabling, does that kind of touching the filesystem impact the size due to docker layers?
Also: python2-kubernetes is a fairly large package (18007990) do we use that in every image? I don't see any tripleo related repos importing from that when searching on Hound? The original commit message[1] adding it states it is for future convenience.
On my undercloud we have 101 images, if we are downloading every 18 MB per image thats almost 1.8 GB for a package we don't use? (I hope it's not like this? With docker layers, we only download that 276 MB transaction once? Or?)
-- Best regards, Bogdan Dobrelya, Irc #bogdando
-- Best regards, Bogdan Dobrelya, Irc #bogdando
On Wed, 2018-11-28 at 15:12 +0100, Bogdan Dobrelya wrote:
On 11/28/18 2:58 PM, Dan Prince wrote:
On Wed, 2018-11-28 at 12:45 +0100, Bogdan Dobrelya wrote:
To follow up and explain the patches for code review:
The "header" patch https://review.openstack.org/620310 -> (requires) https://review.rdoproject.org/r/#/c/17534/, and also https://review.openstack.org/620061 -> (which in turn requires) https://review.openstack.org/619744 -> (Kolla change, the 1st to go) https://review.openstack.org/619736
This email was cross-posted to multiple lists and I think we may have lost some of the context in the process as the subject was changed.
Most of the suggestions and patches are about making our base container(s) smaller in size. And the means by which the patches do that is to share binaries/applications across containers with custom mounts/volumes. I've -2'd most of them. What concerns me however is that some of the TripleO cores seemed open to this idea yesterday on IRC. Perhaps I've misread things but what you appear to be doing here is quite drastic I think we need to consider any of this carefully before proceeding with any of it.
Please also read the commit messages, I tried to explain all "Whys" very carefully. Just to sum up it here as well:
The current self-containing (config and runtime bits) architecture of containers badly affects:
* the size of the base layer and all containers images as an additional 300MB (adds an extra 30% of size).
You are accomplishing this by removing Puppet from the base container, but you are also creating another container in the process. This would still be required on all nodes as Puppet is our config tool. So you would still be downloading some of this data anyways. Understood your reasons for doing this are that it avoids rebuilding all containers when there is a change to any of these packages in the base container. What you are missing however is how often is it the case that Puppet is updated that something else in the base container isn't?
For CI jobs updating all containers, its quite an often to have changes in openstack/tripleo puppet modules to pull in. IIUC, that automatically picks up any updates for all of its dependencies and for the dependencies of dependencies, and all that multiplied by a hundred of total containers to get it updated. That is a *pain* we're used to have these day for quite often timing out CI jobs... Ofc, the main cause is delayed promotions though.
Regarding CI I made a separate suggestion on that below in that rebuilding the base layer more often could be a good solution here. I don't think the puppet-tripleo package is that large however so we could just live with it.
For real deployments, I have no data for the cadence of minor updates in puppet and tripleo & openstack modules for it, let's ask operators (as we're happened to be in the merged openstack-discuss list)? For its dependencies though, like systemd and ruby, I'm pretty sure it's quite often to have CVEs fixed there. So I expect what "in the fields" security fixes delivering for those might bring some unwanted hassle for long-term maintenance of LTS releases. As Tengu noted on IRC: "well, between systemd, puppet and ruby, there are many security concernes, almost every month... and also, what's the point keeping them in runtime containers when they are useless?"
Reiterating again on previous points: -I'd be fine removing systemd. But lets do it properly and not via 'rpm -ev --nodeps'. -Puppet and Ruby *are* required for configuration. We can certainly put them in a separate container outside of the runtime service containers but doing so would actually cost you much more space/bandwidth for each service container. As both of these have to get downloaded to each node anyway in order to generate config files with our current mechanisms I'm not sure this buys you anything. We are going in circles here I think.... Dan
I would wager that it is more rare than you'd think. Perhaps looking at the history of an OpenStack distribution would be a valid way to assess this more critically. Without this data to backup the numbers I'm afraid what you are doing here falls into "pre-optimization" territory for me and I don't think the means used in the patches warrent the benefits you mention here.
* Edge cases, where we have containers images to be distributed, at least once to hit local registries, over high-latency and limited bandwith, highly unreliable WAN connections. * numbers of packages to update in CI for all containers for all services (CI jobs do not rebuild containers so each container gets updated for those 300MB of extra size).
It would seem to me there are other ways to solve the CI containers update problems. Rebuilding the base layer more often would solve this right? If we always build our service containers off of a base layer that is recent there should be no updates to the system/puppet packages there in our CI pipelines.
* security and the surface of attacks, by introducing systemd et al as additional subjects for CVE fixes to maintain for all containers.
We aren't actually using systemd within our containers. I think those packages are getting pulled in by an RPM dependency elsewhere. So rather than using 'rpm -ev --nodeps' to remove it we could create a sub-package for containers in those cases and install it instead. In short rather than hack this to remove them why not pursue a proper packaging fix?
In general I am a fan of getting things out of the base container we don't need... so yeah lets do this. But lets do it properly.
* services uptime, by additional restarts of services related to security maintanence of irrelevant to openstack components sitting as a dead weight in containers images for ever.
Like I said above how often is it that these packages actually change where something else in the base container doesn't? Perhaps we should get more data here before blindly implementing a solution we aren't sure really helps out in the real world.
On 11/27/18 4:08 PM, Bogdan Dobrelya wrote:
Changing the topic to follow the subject.
[tl;dr] it's time to rearchitect container images to stop incluiding config-time only (puppet et al) bits, which are not needed runtime and pose security issues, like CVEs, to maintain daily.
Background: 1) For the Distributed Compute Node edge case, there is potentially tens of thousands of a single-compute-node remote edge sites connected over WAN to a single control plane, which is having high latency, like a 100ms or so, and limited bandwith. 2) For a generic security case, 3) TripleO CI updates all
Challenge:
Here is a related bug [1] and implementation [1] for that. PTAL folks!
[0] https://bugs.launchpad.net/tripleo/+bug/1804822 [1] https://review.openstack.org/#/q/topic:base-container-reduction
Let's also think of removing puppet-tripleo from the base container. It really brings the world-in (and yum updates in CI!) each job and each container! So if we did so, we should then either install puppet- tripleo and co on the host and bind-mount it for the docker-puppet deployment task steps (bad idea IMO), OR use the magical --volumes-from <a-side-car-container> option to mount volumes from some "puppet-config" sidecar container inside each of the containers being launched by docker-puppet tooling.
On Wed, Oct 31, 2018 at 11:16 AM Harald Jensås <hjensas at redhat.com> wrote:
We add this to all images:
https://github.com/openstack/tripleo-common/blob/d35af75b0d8c4683a677660646e...
/bin/sh -c yum -y install iproute iscsi-initiator-utils lvm2 python socat sudo which openstack-tripleo-common-container-base rsync cronie crudini openstack-selinux ansible python-shade puppet- tripleo python2- kubernetes && yum clean all && rm -rf /var/cache/yum 276 MB Is the additional 276 MB reasonable here? openstack-selinux <- This package run relabling, does that kind of touching the filesystem impact the size due to docker layers?
Also: python2-kubernetes is a fairly large package (18007990) do we use that in every image? I don't see any tripleo related repos importing from that when searching on Hound? The original commit message[1] adding it states it is for future convenience.
On my undercloud we have 101 images, if we are downloading every 18 MB per image thats almost 1.8 GB for a package we don't use? (I hope it's not like this? With docker layers, we only download that 276 MB transaction once? Or?)
-- Best regards, Bogdan Dobrelya, Irc #bogdando
<snip>
Reiterating again on previous points:
-I'd be fine removing systemd. But lets do it properly and not via 'rpm -ev --nodeps'. -Puppet and Ruby *are* required for configuration. We can certainly put them in a separate container outside of the runtime service containers but doing so would actually cost you much more space/bandwidth for each service container. As both of these have to get downloaded to each node anyway in order to generate config files with our current mechanisms I'm not sure this buys you anything.
+1. I was actually under the impression that we concluded yesterday on IRC that this is the only thing that makes sense to seriously consider. But even then it's not a win-win -- we'd gain some security by leaner production images, but pay for it with space+bandwidth by duplicating image content (IOW we can help achieve one of the goals we had in mind by worsening the situation w/r/t the other goal we had in mind.) Personally i'm not sold yet but it's something that i'd consider if we got measurements of how much more space/bandwidth usage this would consume, and if we got some further details/examples about how serious are the security concerns if we leave config mgmt tools in runtime images. IIRC the other options (that were brought forward so far) were already dismissed in yesterday's IRC discussion and on the reviews. Bin/lib bind mounting being too hacky and fragile, and nsenter not really solving the problem (because it allows us to switch to having different bins/libs available, but it does not allow merging the availability of bins/libs from two containers into a single context).
We are going in circles here I think....
+1. I think too much of the discussion focuses on "why it's bad to have config tools in runtime images", but IMO we all sorta agree that it would be better not to have them there, if it came at no cost. I think to move forward, it would be interesting to know: if we do this (i'll borrow Dan's drawing): |base container| --> |service container| --> |service container w/ Puppet installed| How much more space and bandwidth would this consume per node (e.g. separately per controller, per compute). This could help with decision making.
Dan
Thanks Jirka __________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
On 11/28/18 6:02 PM, Jiří Stránský wrote:
<snip>
Reiterating again on previous points:
-I'd be fine removing systemd. But lets do it properly and not via 'rpm -ev --nodeps'. -Puppet and Ruby *are* required for configuration. We can certainly put them in a separate container outside of the runtime service containers but doing so would actually cost you much more space/bandwidth for each service container. As both of these have to get downloaded to each node anyway in order to generate config files with our current mechanisms I'm not sure this buys you anything.
+1. I was actually under the impression that we concluded yesterday on IRC that this is the only thing that makes sense to seriously consider. But even then it's not a win-win -- we'd gain some security by leaner production images, but pay for it with space+bandwidth by duplicating image content (IOW we can help achieve one of the goals we had in mind by worsening the situation w/r/t the other goal we had in mind.)
Personally i'm not sold yet but it's something that i'd consider if we got measurements of how much more space/bandwidth usage this would consume, and if we got some further details/examples about how serious are the security concerns if we leave config mgmt tools in runtime images.
IIRC the other options (that were brought forward so far) were already dismissed in yesterday's IRC discussion and on the reviews. Bin/lib bind mounting being too hacky and fragile, and nsenter not really solving the problem (because it allows us to switch to having different bins/libs available, but it does not allow merging the availability of bins/libs from two containers into a single context).
We are going in circles here I think....
+1. I think too much of the discussion focuses on "why it's bad to have config tools in runtime images", but IMO we all sorta agree that it would be better not to have them there, if it came at no cost.
I think to move forward, it would be interesting to know: if we do this (i'll borrow Dan's drawing):
|base container| --> |service container| --> |service container w/ Puppet installed|
How much more space and bandwidth would this consume per node (e.g. separately per controller, per compute). This could help with decision making.
As I've already evaluated in the related bug, that is: puppet-* modules and manifests ~ 16MB puppet with dependencies ~61MB dependencies of the seemingly largest a dependency, systemd ~190MB that would be an extra layer size for each of the container images to be downloaded/fetched into registries. Given that we should decouple systemd from all/some of the dependencies (an example topic for RDO [0]), that could save a 190MB. But it seems we cannot break the love of puppet and systemd as it heavily relies on the latter and changing packaging like that would higly likely affect baremetal deployments with puppet and systemd co-operating. Long story short, we cannot shoot both rabbits with a single shot, not with puppet :) May be we could with ansible replacing puppet fully... So splitting config and runtime images is the only choice yet to address the raised security concerns. And let's forget about edge cases for now. Tossing around a pair of extra bytes over 40,000 WAN-distributed computes ain't gonna be our the biggest problem for sure. [0] https://review.rdoproject.org/r/#/q/topic:base-container-reduction
Dan
Thanks
Jirka
__________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
-- Best regards, Bogdan Dobrelya, Irc #bogdando
On Wed, Nov 28, 2018 at 12:31 PM Bogdan Dobrelya <bdobreli@redhat.com> wrote:
Long story short, we cannot shoot both rabbits with a single shot, not with puppet :) May be we could with ansible replacing puppet fully... So splitting config and runtime images is the only choice yet to address the raised security concerns. And let's forget about edge cases for now. Tossing around a pair of extra bytes over 40,000 WAN-distributed computes ain't gonna be our the biggest problem for sure.
I think it's this last point that is the crux of this discussion. We can agree to disagree about the merits of this proposal and whether it's a pre-optimzation or micro-optimization, which I admit are somewhat subjective terms. Ultimately, it seems to be about the "why" do we need to do this as to the reason why the conversation seems to be going in circles a bit. I'm all for reducing container image size, but the reality is that this proposal doesn't necessarily help us with the Edge use cases we are talking about trying to solve. Why would we even run the exact same puppet binary + manifest individually 40,000 times so that we can produce the exact same set of configuration files that differ only by things such as IP address, hostnames, and passwords? Maybe we should instead be thinking about how we can do that *1* time centrally, and produce a configuration that can be reused across 40,000 nodes with little effort. The opportunity for a significant impact in terms of how we can scale TripleO is much larger if we consider approaching these problems with a wider net of what we could do. There's opportunity for a lot of better reuse in TripleO, configuration is just one area. The plan and Heat stack (within the ResourceGroup) are some other areas. At the same time, if some folks want to work on smaller optimizations (such as container image size), with an approach that can be agreed upon, then they should do so. We just ought to be careful about how we justify those changes so that we can carefully weigh the effort vs the payoff. In this specific case, I don't personally see this proposal helping us with Edge use cases in a meaningful way given the scope of the changes. That's not to say there aren't other use cases that could justify it though (such as the security points brought up earlier). -- -- James Slagle --
On Wed, 28 Nov 2018, James Slagle wrote:
Why would we even run the exact same puppet binary + manifest individually 40,000 times so that we can produce the exact same set of configuration files that differ only by things such as IP address, hostnames, and passwords?
This has been my confusion and question throughout this entire thread. It sounds like containers are being built (and configured) at something akin to runtime, instead of built once and then configured (only) at runtime. Isn't it more the "norm" to, when there's a security fix, build again, once, and cause the stuff at edge (keeping its config) to re-instantiate fetching newly built stuff? Throughout the discussion I've been assuming I must be missing some critical detail because isn't the whole point to have immutable stuff? Maybe it is immutable and you all are talking about it in ways that make it seem otherwise. I dunno. I suspect I am missing some bit of operational experience. In any case, the "differ only by things..." situation is exactly why I added the get-config-from-environment support to oslo.config, so that the different bits can be in the orchestrator, not the containers themselves. More on that at: http://lists.openstack.org/pipermail/openstack-discuss/2018-November/000173.... -- Chris Dent ٩◔̯◔۶ https://anticdent.org/ freenode: cdent tw: @anticdent
On Wed, Nov 28, 2018 at 11:44 AM Chris Dent <cdent+os@anticdent.org> wrote:
On Wed, 28 Nov 2018, James Slagle wrote:
Why would we even run the exact same puppet binary + manifest individually 40,000 times so that we can produce the exact same set of configuration files that differ only by things such as IP address, hostnames, and passwords?
This has been my confusion and question throughout this entire thread. It sounds like containers are being built (and configured) at something akin to runtime, instead of built once and then configured (only) at runtime. Isn't it more the "norm" to, when there's a security fix, build again, once, and cause the stuff at edge (keeping its config) to re-instantiate fetching newly built stuff?
No we aren't building container items, we're building configurations. The way it work s in tripleo is that we use the same containers to generate the configurations as we do to run the services themselves. These configurations are mounted off the host as to not end up in the container. This is primarily because things like the puppet modules assume certain chunks of software/configuration files exist. So we're generating the configuration files to be mounted into the run time container. The puppet providers are extremely mature and allow for in place editing and no templates which is how we can get away with this in containers. The containers themselves are not build or modified on the fly in this case. IMHO this is a side effect of configurations (files) for openstack services and their service dependencies where we need to somehow inject the running config into the container rather than being able to load it from an external source (remember the etcd oslo stuff from a few cycles ago?). Our problem is our reliance on puppet due to existing established configuration patterns and the sheer amount of code required to configure openstack & company. So we end up having to carry these package dependencies in the service containers because that's where we generate the configs. There are additional dependencies on being able to know about hardware specifics (facts) that come into play with the configurations such that we may not be able to generate the configs off the deployment host and just ship those with the containers.
Throughout the discussion I've been assuming I must be missing some critical detail because isn't the whole point to have immutable stuff? Maybe it is immutable and you all are talking about it in ways that make it seem otherwise. I dunno. I suspect I am missing some bit of operational experience.
The application is immutable, but the configs need to be generated depending on where they end up or the end users desired configuration. For some service that includes pulling in some information about the host and including that (SRIOV, pci, etc).
In any case, the "differ only by things..." situation is exactly why I added the get-config-from-environment support to oslo.config, so that the different bits can be in the orchestrator, not the containers themselves. More on that at:
http://lists.openstack.org/pipermail/openstack-discuss/2018-November/000173....
Given the vast amount of configurations exposed in each service, i'm not sure environment variables help here. Additionally that doesn't solve for non-oslo services (mysql/rabbitmq/etc) so then you'd end up having two ways of having to configure the containers/services.
-- Chris Dent ٩◔̯◔۶ https://anticdent.org/ freenode: cdent tw: @anticdent
On Nov 28, 2018, at 2:27 PM, Alex Schultz <aschultz@redhat.com> wrote:
On Wed, Nov 28, 2018 at 11:44 AM Chris Dent <cdent+os@anticdent.org> wrote:
On Wed, 28 Nov 2018, James Slagle wrote:
Why would we even run the exact same puppet binary + manifest individually 40,000 times so that we can produce the exact same set of configuration files that differ only by things such as IP address, hostnames, and passwords?
This has been my confusion and question throughout this entire thread. It sounds like containers are being built (and configured) at something akin to runtime, instead of built once and then configured (only) at runtime. Isn't it more the "norm" to, when there's a security fix, build again, once, and cause the stuff at edge (keeping its config) to re-instantiate fetching newly built stuff?
No we aren't building container items, we're building configurations. The way it work s in tripleo is that we use the same containers to generate the configurations as we do to run the services themselves. These configurations are mounted off the host as to not end up in the container. This is primarily because things like the puppet modules assume certain chunks of software/configuration files exist. So we're generating the configuration files to be mounted into the run time container. The puppet providers are extremely mature and allow for in place editing and no templates which is how we can get away with this in containers. The containers themselves are not build or modified on the fly in this case.
IMHO this is a side effect of configurations (files) for openstack services and their service dependencies where we need to somehow inject the running config into the container rather than being able to load it from an external source (remember the etcd oslo stuff from a few cycles ago?).
I thought the preferred solution for more complex settings was config maps. Did that approach not work out? Regardless, now that the driver work is done if someone wants to take another stab at etcd integration it’ll be more straightforward today. Doug
On Wed, 28 Nov 2018, Alex Schultz wrote: [stuff where I'm clearly in over my head, am missing critical context, and don't know what I'm talking about, so just gonna stay out, deleted]
Throughout the discussion I've been assuming I must be missing some critical detail because isn't the whole point to have immutable stuff? Maybe it is immutable and you all are talking about it in ways that make it seem otherwise. I dunno. I suspect I am missing some bit of operational experience.
The application is immutable, but the configs need to be generated depending on where they end up or the end users desired configuration. For some service that includes pulling in some information about the host and including that (SRIOV, pci, etc).
Presumably most of the config is immutable as well and there are only a (relatively) small number of per-instance-of-thing differences?
Given the vast amount of configurations exposed in each service, i'm not sure environment variables help here. Additionally that doesn't solve for non-oslo services (mysql/rabbitmq/etc) so then you'd end up having two ways of having to configure the containers/services.
The idea is for the environment variables to only be used for the small number of differences, not everything. As what amount to overrides. What I'm trying to understand is why this trope of container management doesn't apply here: A: How do I manage configuration _in_ my containers? B: Don't. A: ? B: Manage it from the outside, tell the container its config when it starts. If the config needs to change, start a new container. I'm pretty sure this isn't really germane to the original point of this thread, so apologies for adding to the noise, but it was hard to resist. I'll try harder. -- Chris Dent ٩◔̯◔۶ https://anticdent.org/ freenode: cdent tw: @anticdent
On 11/29/2018 05:29 AM, Chris Dent wrote:
On Wed, 28 Nov 2018, Alex Schultz wrote:
[stuff where I'm clearly in over my head, am missing critical context, and don't know what I'm talking about, so just gonna stay out, deleted]
Throughout the discussion I've been assuming I must be missing some critical detail because isn't the whole point to have immutable stuff? Maybe it is immutable and you all are talking about it in ways that make it seem otherwise. I dunno. I suspect I am missing some bit of operational experience.
The application is immutable, but the configs need to be generated depending on where they end up or the end users desired configuration. For some service that includes pulling in some information about the host and including that (SRIOV, pci, etc).
Presumably most of the config is immutable as well and there are only a (relatively) small number of per-instance-of-thing differences?
Given the vast amount of configurations exposed in each service, i'm not sure environment variables help here. Additionally that doesn't solve for non-oslo services (mysql/rabbitmq/etc) so then you'd end up having two ways of having to configure the containers/services.
Not sure about RabbitMQ, but certainly MySQL/MariaDB takes command line argument overrides if the container running MySQL server actually has the mysql server as its entrypoint. I'm not actually sure how the Triple-O container for MySQL/MariaDB is constructed, though. I tried finding where MySQL/MariaDB container was constructed in the dozens of tripleo-related repositories on github but gave up. Maybe someone with knowledge of triple-o's internals can point me to that Dockerfile?
The idea is for the environment variables to only be used for the small number of differences, not everything. As what amount to overrides.
What I'm trying to understand is why this trope of container management doesn't apply here:
A: How do I manage configuration _in_ my containers? B: Don't. A: ? B: Manage it from the outside, tell the container its config when it starts. If the config needs to change, start a new container.
Precisely my thoughts as well. However, if the containers you are using aren't really application containers (having single-process entrypoints) and are really just lightweight VMs in disguise as containers, then you pretty much throw the above trope out the window and are back to square one using legacy [1] configuration management techniques to configure the "containers" as you would configure a baremetal host or VM. In any case, it sounds like the triple-o team is attempting to find any ways they can put their containers on a diet, and I fully support that effort, as I'm sure you do as well. -jay [1] legacy now equals between 3 and 5 years old. :(
On 11/29/2018 05:29 AM, Chris Dent wrote:
On Wed, 28 Nov 2018, Alex Schultz wrote:
[stuff where I'm clearly in over my head, am missing critical context, and don't know what I'm talking about, so just gonna stay out, deleted]
Throughout the discussion I've been assuming I must be missing some critical detail because isn't the whole point to have immutable stuff? Maybe it is immutable and you all are talking about it in ways that make it seem otherwise. I dunno. I suspect I am missing some bit of operational experience.
The application is immutable, but the configs need to be generated depending on where they end up or the end users desired configuration. For some service that includes pulling in some information about the host and including that (SRIOV, pci, etc).
Presumably most of the config is immutable as well and there are only a (relatively) small number of per-instance-of-thing differences?
Given the vast amount of configurations exposed in each service, i'm not sure environment variables help here. Additionally that doesn't solve for non-oslo services (mysql/rabbitmq/etc) so then you'd end up having two ways of having to configure the containers/services.
Not sure about RabbitMQ, but certainly MySQL/MariaDB takes command line argument overrides if the container running MySQL server actually has the mysql server as its entrypoint.
On Thu, 2018-11-29 at 07:38 -0500, Jay Pipes wrote: the containers come from kolla https://github.com/openstack/kolla/blob/master/docker/mariadb/Dockerfile.j2 the entry point is the kolla_start script which does a few things to normalise all the kolla continer "abi" then runs the command specifed via a json file. https://github.com/openstack/kolla-ansible/blob/master/ansible/roles/mariadb... so basicaly it makes sure the config files esists and the permissions are correct then calls /usr/bin/mysqld_safe
I'm not actually sure how the Triple-O container for MySQL/MariaDB is constructed, though. I tried finding where MySQL/MariaDB container was constructed in the dozens of tripleo-related repositories on github but gave up. Maybe someone with knowledge of triple-o's internals can point me to that Dockerfile?
ya the fact triplo uses kolla container make them hard to find unless you know that. alther tripleo uses a template override file to modify the kolla container a little for tis needs but in generall they should be the same as the vanila kolla ones.
The idea is for the environment variables to only be used for the small number of differences, not everything. As what amount to overrides.
i dont know if docker has fixed this but you used to be unable to change env vars set after a contier was created which mean to go to debug mode logging you had to destroy and recreate the container which is annoying vs config change and sighup for servies that support mutable config.
What I'm trying to understand is why this trope of container management doesn't apply here:
A: How do I manage configuration _in_ my containers? B: Don't. A: ? B: Manage it from the outside, tell the container its config when it starts. If the config needs to change, start a new container.
Precisely my thoughts as well.
However, if the containers you are using aren't really application containers (having single-process entrypoints) and are really just lightweight VMs in disguise as containers, then you pretty much throw the above trope out the window and are back to square one using legacy [1] configuration management techniques to configure the "containers" as you would configure a baremetal host or VM.
In any case, it sounds like the triple-o team is attempting to find any ways they can put their containers on a diet, and I fully support that effort, as I'm sure you do as well.
-jay
[1] legacy now equals between 3 and 5 years old. :(
On Thu, Nov 29, 2018 at 5:31 AM Chris Dent <cdent+os@anticdent.org> wrote:
On Wed, 28 Nov 2018, Alex Schultz wrote:
[stuff where I'm clearly in over my head, am missing critical context, and don't know what I'm talking about, so just gonna stay out, deleted]
Throughout the discussion I've been assuming I must be missing some critical detail because isn't the whole point to have immutable stuff? Maybe it is immutable and you all are talking about it in ways that make it seem otherwise. I dunno. I suspect I am missing some bit of operational experience.
The application is immutable, but the configs need to be generated depending on where they end up or the end users desired configuration. For some service that includes pulling in some information about the host and including that (SRIOV, pci, etc).
Presumably most of the config is immutable as well and there are only a (relatively) small number of per-instance-of-thing differences?
Yes exactly. I don't think we actually do that much based on individual system facts other than IP's and hostnames. Which is why generating the config individually on each node seems like a good area for optimization. Especially when we are talking about large groups of nodes that will be identical (same number of cores, memory, etc).
Given the vast amount of configurations exposed in each service, i'm not sure environment variables help here. Additionally that doesn't solve for non-oslo services (mysql/rabbitmq/etc) so then you'd end up having two ways of having to configure the containers/services.
The idea is for the environment variables to only be used for the small number of differences, not everything. As what amount to overrides.
What I'm trying to understand is why this trope of container management doesn't apply here:
A: How do I manage configuration _in_ my containers? B: Don't. A: ? B: Manage it from the outside, tell the container its config when it starts. If the config needs to change, start a new container.
I'm pretty sure this isn't really germane to the original point of this thread, so apologies for adding to the noise, but it was hard to resist. I'll try harder.
Is it explained earlier in the thread? http://lists.openstack.org/pipermail/openstack-dev/2018-November/136582.html With additional context as to the "why" here: http://lists.openstack.org/pipermail/openstack-dev/2018-November/136597.html I'm fairly sure this addresses your question, but happy to offer more details if not. -- -- James Slagle --
On Wed, 2018-11-28 at 13:28 -0500, James Slagle wrote:
On Wed, Nov 28, 2018 at 12:31 PM Bogdan Dobrelya <bdobreli@redhat.com
wrote: Long story short, we cannot shoot both rabbits with a single shot, not with puppet :) May be we could with ansible replacing puppet fully... So splitting config and runtime images is the only choice yet to address the raised security concerns. And let's forget about edge cases for now. Tossing around a pair of extra bytes over 40,000 WAN-distributed computes ain't gonna be our the biggest problem for sure.
I think it's this last point that is the crux of this discussion. We can agree to disagree about the merits of this proposal and whether it's a pre-optimzation or micro-optimization, which I admit are somewhat subjective terms. Ultimately, it seems to be about the "why" do we need to do this as to the reason why the conversation seems to be going in circles a bit.
I'm all for reducing container image size, but the reality is that this proposal doesn't necessarily help us with the Edge use cases we are talking about trying to solve.
Why would we even run the exact same puppet binary + manifest individually 40,000 times so that we can produce the exact same set of configuration files that differ only by things such as IP address, hostnames, and passwords? Maybe we should instead be thinking about how we can do that *1* time centrally, and produce a configuration that can be reused across 40,000 nodes with little effort. The opportunity for a significant impact in terms of how we can scale TripleO is much larger if we consider approaching these problems with a wider net of what we could do. There's opportunity for a lot of better reuse in TripleO, configuration is just one area. The plan and Heat stack (within the ResourceGroup) are some other areas.
We run Puppet for configuration because that is what we did on baremetal and we didn't break backwards compatability for our configuration options for upgrades. Our Puppet model relies on being executed on each local host in order to splice in the correct IP address and hostname. It executes in a distributed fashion, and works fairly well considering the history of the project. It is robust, guarantees no duplicate configs are being set, and is backwards compatible with all the options TripleO supported on baremetal. Puppet is arguably better for configuration than Ansible (which is what I hear people most often suggest we replace it with). It suits our needs fine, but it is perhaps a bit overkill considering we are only generating config files. I think the answer here is moving to something like Etcd. Perhaps skipping over Ansible entirely as a config management tool (it is arguably less capable than Puppet in this category anyway). Or we could use Ansible for "legacy" services only, switch to Etcd for a majority of the OpenStack services, and drop Puppet entirely (my favorite option). Consolidating our technology stack would be wise. We've already put some work and analysis into the Etcd effort. Just need to push on it some more. Looking at the previous Kubernetes prototypes for TripleO would be the place to start. Config management migration is going to be tedious. Its technical debt that needs to be handled at some point anyway. I think it is a general TripleO improvement that could benefit all clouds, not just Edge. Dan
At the same time, if some folks want to work on smaller optimizations (such as container image size), with an approach that can be agreed upon, then they should do so. We just ought to be careful about how we justify those changes so that we can carefully weigh the effort vs the payoff. In this specific case, I don't personally see this proposal helping us with Edge use cases in a meaningful way given the scope of the changes. That's not to say there aren't other use cases that could justify it though (such as the security points brought up earlier).
On 11/28/18 8:55 PM, Doug Hellmann wrote:
I thought the preferred solution for more complex settings was config maps. Did that approach not work out?
Regardless, now that the driver work is done if someone wants to take another stab at etcd integration it’ll be more straightforward today.
Doug
While sharing configs is a feasible option to consider for large scale configuration management, Etcd only provides a strong consistency, which is also known as "Unavailable" [0]. For edge scenarios, to configure 40,000 remote computes over WAN connections, we'd rather want instead weaker consistency models, like "Sticky Available" [0]. That would allow services to fetch their configuration either from a central "uplink" or locally as well, when the latter is not accessible from remote edge sites. Etcd cannot provide 40,000 local endpoints to fit that case I'm afraid, even if those would be read only replicas. That is also something I'm highlighting in the paper [1] drafted for ICFC-2019. But had we such a sticky available key value storage solution, we would indeed have solved the problem of multiple configuration management system execution for thousands of nodes as James describes it. [0] https://jepsen.io/consistency [1] https://github.com/bogdando/papers-ieee/blob/master/ICFC-2019/LaTeX/position... On 11/28/18 11:22 PM, Dan Prince wrote:
On Wed, 2018-11-28 at 13:28 -0500, James Slagle wrote:
On Wed, Nov 28, 2018 at 12:31 PM Bogdan Dobrelya <bdobreli@redhat.com
wrote: Long story short, we cannot shoot both rabbits with a single shot, not with puppet :) May be we could with ansible replacing puppet fully... So splitting config and runtime images is the only choice yet to address the raised security concerns. And let's forget about edge cases for now. Tossing around a pair of extra bytes over 40,000 WAN-distributed computes ain't gonna be our the biggest problem for sure.
I think it's this last point that is the crux of this discussion. We can agree to disagree about the merits of this proposal and whether it's a pre-optimzation or micro-optimization, which I admit are somewhat subjective terms. Ultimately, it seems to be about the "why" do we need to do this as to the reason why the conversation seems to be going in circles a bit.
I'm all for reducing container image size, but the reality is that this proposal doesn't necessarily help us with the Edge use cases we are talking about trying to solve.
Why would we even run the exact same puppet binary + manifest individually 40,000 times so that we can produce the exact same set of configuration files that differ only by things such as IP address, hostnames, and passwords? Maybe we should instead be thinking about how we can do that *1* time centrally, and produce a configuration that can be reused across 40,000 nodes with little effort. The opportunity for a significant impact in terms of how we can scale TripleO is much larger if we consider approaching these problems with a wider net of what we could do. There's opportunity for a lot of better reuse in TripleO, configuration is just one area. The plan and Heat stack (within the ResourceGroup) are some other areas.
We run Puppet for configuration because that is what we did on baremetal and we didn't break backwards compatability for our configuration options for upgrades. Our Puppet model relies on being executed on each local host in order to splice in the correct IP address and hostname. It executes in a distributed fashion, and works fairly well considering the history of the project. It is robust, guarantees no duplicate configs are being set, and is backwards compatible with all the options TripleO supported on baremetal. Puppet is arguably better for configuration than Ansible (which is what I hear people most often suggest we replace it with). It suits our needs fine, but it is perhaps a bit overkill considering we are only generating config files.
I think the answer here is moving to something like Etcd. Perhaps
Not Etcd I think, see my comment above. But you're absolutely right Dan.
skipping over Ansible entirely as a config management tool (it is arguably less capable than Puppet in this category anyway). Or we could use Ansible for "legacy" services only, switch to Etcd for a majority of the OpenStack services, and drop Puppet entirely (my favorite option). Consolidating our technology stack would be wise.
We've already put some work and analysis into the Etcd effort. Just need to push on it some more. Looking at the previous Kubernetes prototypes for TripleO would be the place to start.
Config management migration is going to be tedious. Its technical debt that needs to be handled at some point anyway. I think it is a general TripleO improvement that could benefit all clouds, not just Edge.
Dan
At the same time, if some folks want to work on smaller optimizations (such as container image size), with an approach that can be agreed upon, then they should do so. We just ought to be careful about how we justify those changes so that we can carefully weigh the effort vs the payoff. In this specific case, I don't personally see this proposal helping us with Edge use cases in a meaningful way given the scope of the changes. That's not to say there aren't other use cases that could justify it though (such as the security points brought up earlier).
-- Best regards, Bogdan Dobrelya, Irc #bogdando
On 11/29/2018 04:28 AM, Bogdan Dobrelya wrote:
On 11/28/18 8:55 PM, Doug Hellmann wrote:
I thought the preferred solution for more complex settings was config maps. Did that approach not work out?
Regardless, now that the driver work is done if someone wants to take another stab at etcd integration it’ll be more straightforward today.
Doug
While sharing configs is a feasible option to consider for large scale configuration management, Etcd only provides a strong consistency, which is also known as "Unavailable" [0]. For edge scenarios, to configure 40,000 remote computes over WAN connections, we'd rather want instead weaker consistency models, like "Sticky Available" [0]. That would allow services to fetch their configuration either from a central "uplink" or locally as well, when the latter is not accessible from remote edge sites. Etcd cannot provide 40,000 local endpoints to fit that case I'm afraid, even if those would be read only replicas. That is also something I'm highlighting in the paper [1] drafted for ICFC-2019.
But had we such a sticky available key value storage solution, we would indeed have solved the problem of multiple configuration management system execution for thousands of nodes as James describes it.
It's not that etcd is incapable of providing something like this. It's that a *single* etcd KVS used by 40K compute nodes across a disaggregated control plane would not be available to all of those nodes simultaneously. But you could certainly use etcd as the data store to build a sticky available configuration data store. If, for example, you had many local [1] etcd KVS that stored local data and synchronized the local data set with other etcd KVS endpoints when a network partition was restored, you could get such a system that was essentially "sticky available" for all intents and purposes. Come to think of it, you could do the same with a SQLite DB, ala Swift's replication of SQLite DBs via rsync. But, at the risk of sounding like a broken record, at the end of the day, many of OpenStack's core services -- notably Nova -- were not designed for disaggregated control planes. They were designed for the datacenter, with tightly-packed compute resources and low-latency links for the control plane. The entire communication bus and state management system would need to be redesigned from the nova-compute to the nova-conductor for (far) edge case clouds to be a true reality. Instead of sending all data updates synchronously from each nova-compute to nova-conductor, the communication bus needs to be radically redesigned so that the nova-compute uses a local data store *as its primary data storage* and then asynchronously sends batched updates to known control plane endpoints when those regular network partitions correct themselves. The nova-compute manager will need to be substantially hardened to keep itself up and running (and writing to that local state storage) for long periods of time and contain all the logic to resync itself when network uplinks become available again. Finally, if those local nova-computes need to actually *do* anything other than keep existing VMs/baremetal machines up and running, then a local Compute API service needs to be made available in the far edge sites themselves -- offering some subset of Compute API functionality to control the VMs in that local site. Otherwise, the whole "multiple department stores running an edge OpenStack site that can tolerate the Mother Ship being down" isn't a thing that will work. Like I said, pretty much a complete redesign of the nova control plane... Best, -jay [1] or local-ish, think POPs or even local to the compute node itself...
[0] https://jepsen.io/consistency [1] https://github.com/bogdando/papers-ieee/blob/master/ICFC-2019/LaTeX/position...
On 11/28/18 11:22 PM, Dan Prince wrote:
On Wed, 2018-11-28 at 13:28 -0500, James Slagle wrote:
On Wed, Nov 28, 2018 at 12:31 PM Bogdan Dobrelya <bdobreli@redhat.com
wrote: Long story short, we cannot shoot both rabbits with a single shot, not with puppet :) May be we could with ansible replacing puppet fully... So splitting config and runtime images is the only choice yet to address the raised security concerns. And let's forget about edge cases for now. Tossing around a pair of extra bytes over 40,000 WAN-distributed computes ain't gonna be our the biggest problem for sure.
I think it's this last point that is the crux of this discussion. We can agree to disagree about the merits of this proposal and whether it's a pre-optimzation or micro-optimization, which I admit are somewhat subjective terms. Ultimately, it seems to be about the "why" do we need to do this as to the reason why the conversation seems to be going in circles a bit.
I'm all for reducing container image size, but the reality is that this proposal doesn't necessarily help us with the Edge use cases we are talking about trying to solve.
Why would we even run the exact same puppet binary + manifest individually 40,000 times so that we can produce the exact same set of configuration files that differ only by things such as IP address, hostnames, and passwords? Maybe we should instead be thinking about how we can do that *1* time centrally, and produce a configuration that can be reused across 40,000 nodes with little effort. The opportunity for a significant impact in terms of how we can scale TripleO is much larger if we consider approaching these problems with a wider net of what we could do. There's opportunity for a lot of better reuse in TripleO, configuration is just one area. The plan and Heat stack (within the ResourceGroup) are some other areas.
We run Puppet for configuration because that is what we did on baremetal and we didn't break backwards compatability for our configuration options for upgrades. Our Puppet model relies on being executed on each local host in order to splice in the correct IP address and hostname. It executes in a distributed fashion, and works fairly well considering the history of the project. It is robust, guarantees no duplicate configs are being set, and is backwards compatible with all the options TripleO supported on baremetal. Puppet is arguably better for configuration than Ansible (which is what I hear people most often suggest we replace it with). It suits our needs fine, but it is perhaps a bit overkill considering we are only generating config files.
I think the answer here is moving to something like Etcd. Perhaps
Not Etcd I think, see my comment above. But you're absolutely right Dan.
skipping over Ansible entirely as a config management tool (it is arguably less capable than Puppet in this category anyway). Or we could use Ansible for "legacy" services only, switch to Etcd for a majority of the OpenStack services, and drop Puppet entirely (my favorite option). Consolidating our technology stack would be wise.
We've already put some work and analysis into the Etcd effort. Just need to push on it some more. Looking at the previous Kubernetes prototypes for TripleO would be the place to start.
Config management migration is going to be tedious. Its technical debt that needs to be handled at some point anyway. I think it is a general TripleO improvement that could benefit all clouds, not just Edge.
Dan
At the same time, if some folks want to work on smaller optimizations (such as container image size), with an approach that can be agreed upon, then they should do so. We just ought to be careful about how we justify those changes so that we can carefully weigh the effort vs the payoff. In this specific case, I don't personally see this proposal helping us with Edge use cases in a meaningful way given the scope of the changes. That's not to say there aren't other use cases that could justify it though (such as the security points brought up earlier).
On 28. 11. 18 18:29, Bogdan Dobrelya wrote:
On 11/28/18 6:02 PM, Jiří Stránský wrote:
<snip>
Reiterating again on previous points:
-I'd be fine removing systemd. But lets do it properly and not via 'rpm -ev --nodeps'. -Puppet and Ruby *are* required for configuration. We can certainly put them in a separate container outside of the runtime service containers but doing so would actually cost you much more space/bandwidth for each service container. As both of these have to get downloaded to each node anyway in order to generate config files with our current mechanisms I'm not sure this buys you anything.
+1. I was actually under the impression that we concluded yesterday on IRC that this is the only thing that makes sense to seriously consider. But even then it's not a win-win -- we'd gain some security by leaner production images, but pay for it with space+bandwidth by duplicating image content (IOW we can help achieve one of the goals we had in mind by worsening the situation w/r/t the other goal we had in mind.)
Personally i'm not sold yet but it's something that i'd consider if we got measurements of how much more space/bandwidth usage this would consume, and if we got some further details/examples about how serious are the security concerns if we leave config mgmt tools in runtime images.
IIRC the other options (that were brought forward so far) were already dismissed in yesterday's IRC discussion and on the reviews. Bin/lib bind mounting being too hacky and fragile, and nsenter not really solving the problem (because it allows us to switch to having different bins/libs available, but it does not allow merging the availability of bins/libs from two containers into a single context).
We are going in circles here I think....
+1. I think too much of the discussion focuses on "why it's bad to have config tools in runtime images", but IMO we all sorta agree that it would be better not to have them there, if it came at no cost.
I think to move forward, it would be interesting to know: if we do this (i'll borrow Dan's drawing):
|base container| --> |service container| --> |service container w/ Puppet installed|
How much more space and bandwidth would this consume per node (e.g. separately per controller, per compute). This could help with decision making.
As I've already evaluated in the related bug, that is:
puppet-* modules and manifests ~ 16MB puppet with dependencies ~61MB dependencies of the seemingly largest a dependency, systemd ~190MB
that would be an extra layer size for each of the container images to be downloaded/fetched into registries.
Thanks, i tried to do the math of the reduction vs. inflation in sizes as follows. I think the crucial point here is the layering. If we do this image layering: |base| --> |+ service| --> |+ Puppet| we'd drop ~267 MB from base image, but we'd be installing that to the topmost level, per-component, right? In my basic deployment, undercloud seems to have 17 "components" (49 containers), overcloud controller 15 components (48 containers), and overcloud compute 4 components (7 containers). Accounting for overlaps, the total number of "components" used seems to be 19. (By "components" here i mean whatever uses a different ConfigImage than other services. I just eyeballed it but i think i'm not too far off the correct number.) So we'd subtract 267 MB from base image and add that to 19 leaf images used in this deployment. That means difference of +4.8 GB to the current image sizes. My /var/lib/registry dir on undercloud with all the images currently has 5.1 GB. We'd almost double that to 9.9 GB. Going from 5.1 to 9.9 GB seems like a lot of extra traffic for the CDNs (both external and e.g. internal within OpenStack Infra CI clouds). And for internal traffic between local registry and overcloud nodes, it gives +3.7 GB per controller and +800 MB per compute. That may not be so critical but still feels like a considerable downside. Another gut feeling is that this way of image layering would take longer time to build and to run the modify-image Ansible role which we use in CI, so that could endanger how our CI jobs fit into the time limit. We could also probably measure this but i'm not sure if it's worth spending the time. All in all i'd argue we should be looking at different options still.
Given that we should decouple systemd from all/some of the dependencies (an example topic for RDO [0]), that could save a 190MB. But it seems we cannot break the love of puppet and systemd as it heavily relies on the latter and changing packaging like that would higly likely affect baremetal deployments with puppet and systemd co-operating.
Ack :/
Long story short, we cannot shoot both rabbits with a single shot, not with puppet :) May be we could with ansible replacing puppet fully... So splitting config and runtime images is the only choice yet to address the raised security concerns. And let's forget about edge cases for now. Tossing around a pair of extra bytes over 40,000 WAN-distributed computes ain't gonna be our the biggest problem for sure.
[0] https://review.rdoproject.org/r/#/q/topic:base-container-reduction
Dan
Thanks
Jirka
__________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
__________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
If the base layers are shared, you won't pay extra for the separate puppet container unless you have another container also installing ruby in an upper layer. With OpenStack, thats unlikely. the apparent size of a container is not equal to its actual size. Thanks, Kevin ________________________________________ From: Jiří Stránský [jistr@redhat.com] Sent: Thursday, November 29, 2018 9:42 AM To: openstack-dev@lists.openstack.org Subject: Re: [openstack-dev] [TripleO][Edge] Reduce base layer of containers for security and size of images (maintenance) sakes On 28. 11. 18 18:29, Bogdan Dobrelya wrote:
On 11/28/18 6:02 PM, Jiří Stránský wrote:
<snip>
Reiterating again on previous points:
-I'd be fine removing systemd. But lets do it properly and not via 'rpm -ev --nodeps'. -Puppet and Ruby *are* required for configuration. We can certainly put them in a separate container outside of the runtime service containers but doing so would actually cost you much more space/bandwidth for each service container. As both of these have to get downloaded to each node anyway in order to generate config files with our current mechanisms I'm not sure this buys you anything.
+1. I was actually under the impression that we concluded yesterday on IRC that this is the only thing that makes sense to seriously consider. But even then it's not a win-win -- we'd gain some security by leaner production images, but pay for it with space+bandwidth by duplicating image content (IOW we can help achieve one of the goals we had in mind by worsening the situation w/r/t the other goal we had in mind.)
Personally i'm not sold yet but it's something that i'd consider if we got measurements of how much more space/bandwidth usage this would consume, and if we got some further details/examples about how serious are the security concerns if we leave config mgmt tools in runtime images.
IIRC the other options (that were brought forward so far) were already dismissed in yesterday's IRC discussion and on the reviews. Bin/lib bind mounting being too hacky and fragile, and nsenter not really solving the problem (because it allows us to switch to having different bins/libs available, but it does not allow merging the availability of bins/libs from two containers into a single context).
We are going in circles here I think....
+1. I think too much of the discussion focuses on "why it's bad to have config tools in runtime images", but IMO we all sorta agree that it would be better not to have them there, if it came at no cost.
I think to move forward, it would be interesting to know: if we do this (i'll borrow Dan's drawing):
|base container| --> |service container| --> |service container w/ Puppet installed|
How much more space and bandwidth would this consume per node (e.g. separately per controller, per compute). This could help with decision making.
As I've already evaluated in the related bug, that is:
puppet-* modules and manifests ~ 16MB puppet with dependencies ~61MB dependencies of the seemingly largest a dependency, systemd ~190MB
that would be an extra layer size for each of the container images to be downloaded/fetched into registries.
Thanks, i tried to do the math of the reduction vs. inflation in sizes as follows. I think the crucial point here is the layering. If we do this image layering: |base| --> |+ service| --> |+ Puppet| we'd drop ~267 MB from base image, but we'd be installing that to the topmost level, per-component, right? In my basic deployment, undercloud seems to have 17 "components" (49 containers), overcloud controller 15 components (48 containers), and overcloud compute 4 components (7 containers). Accounting for overlaps, the total number of "components" used seems to be 19. (By "components" here i mean whatever uses a different ConfigImage than other services. I just eyeballed it but i think i'm not too far off the correct number.) So we'd subtract 267 MB from base image and add that to 19 leaf images used in this deployment. That means difference of +4.8 GB to the current image sizes. My /var/lib/registry dir on undercloud with all the images currently has 5.1 GB. We'd almost double that to 9.9 GB. Going from 5.1 to 9.9 GB seems like a lot of extra traffic for the CDNs (both external and e.g. internal within OpenStack Infra CI clouds). And for internal traffic between local registry and overcloud nodes, it gives +3.7 GB per controller and +800 MB per compute. That may not be so critical but still feels like a considerable downside. Another gut feeling is that this way of image layering would take longer time to build and to run the modify-image Ansible role which we use in CI, so that could endanger how our CI jobs fit into the time limit. We could also probably measure this but i'm not sure if it's worth spending the time. All in all i'd argue we should be looking at different options still.
Given that we should decouple systemd from all/some of the dependencies (an example topic for RDO [0]), that could save a 190MB. But it seems we cannot break the love of puppet and systemd as it heavily relies on the latter and changing packaging like that would higly likely affect baremetal deployments with puppet and systemd co-operating.
Ack :/
Long story short, we cannot shoot both rabbits with a single shot, not with puppet :) May be we could with ansible replacing puppet fully... So splitting config and runtime images is the only choice yet to address the raised security concerns. And let's forget about edge cases for now. Tossing around a pair of extra bytes over 40,000 WAN-distributed computes ain't gonna be our the biggest problem for sure.
[0] https://review.rdoproject.org/r/#/q/topic:base-container-reduction
Dan
Thanks
Jirka
__________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
__________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Oh, rereading the conversation again, the concern is having shared deps move up layers? so more systemd related then ruby? The conversation about --nodeps makes it sound like its not actually used. Just an artifact of how the rpms are built... What about creating a dummy package that provides(systemd)? That avoids using --nodeps. Thanks, Kevin ________________________________________ From: Fox, Kevin M [Kevin.Fox@pnnl.gov] Sent: Thursday, November 29, 2018 11:20 AM To: Former OpenStack Development Mailing List, use openstack-discuss now Subject: Re: [openstack-dev] [TripleO][Edge] Reduce base layer of containers for security and size of images (maintenance) sakes If the base layers are shared, you won't pay extra for the separate puppet container unless you have another container also installing ruby in an upper layer. With OpenStack, thats unlikely. the apparent size of a container is not equal to its actual size. Thanks, Kevin ________________________________________ From: Jiří Stránský [jistr@redhat.com] Sent: Thursday, November 29, 2018 9:42 AM To: openstack-dev@lists.openstack.org Subject: Re: [openstack-dev] [TripleO][Edge] Reduce base layer of containers for security and size of images (maintenance) sakes On 28. 11. 18 18:29, Bogdan Dobrelya wrote:
On 11/28/18 6:02 PM, Jiří Stránský wrote:
<snip>
Reiterating again on previous points:
-I'd be fine removing systemd. But lets do it properly and not via 'rpm -ev --nodeps'. -Puppet and Ruby *are* required for configuration. We can certainly put them in a separate container outside of the runtime service containers but doing so would actually cost you much more space/bandwidth for each service container. As both of these have to get downloaded to each node anyway in order to generate config files with our current mechanisms I'm not sure this buys you anything.
+1. I was actually under the impression that we concluded yesterday on IRC that this is the only thing that makes sense to seriously consider. But even then it's not a win-win -- we'd gain some security by leaner production images, but pay for it with space+bandwidth by duplicating image content (IOW we can help achieve one of the goals we had in mind by worsening the situation w/r/t the other goal we had in mind.)
Personally i'm not sold yet but it's something that i'd consider if we got measurements of how much more space/bandwidth usage this would consume, and if we got some further details/examples about how serious are the security concerns if we leave config mgmt tools in runtime images.
IIRC the other options (that were brought forward so far) were already dismissed in yesterday's IRC discussion and on the reviews. Bin/lib bind mounting being too hacky and fragile, and nsenter not really solving the problem (because it allows us to switch to having different bins/libs available, but it does not allow merging the availability of bins/libs from two containers into a single context).
We are going in circles here I think....
+1. I think too much of the discussion focuses on "why it's bad to have config tools in runtime images", but IMO we all sorta agree that it would be better not to have them there, if it came at no cost.
I think to move forward, it would be interesting to know: if we do this (i'll borrow Dan's drawing):
|base container| --> |service container| --> |service container w/ Puppet installed|
How much more space and bandwidth would this consume per node (e.g. separately per controller, per compute). This could help with decision making.
As I've already evaluated in the related bug, that is:
puppet-* modules and manifests ~ 16MB puppet with dependencies ~61MB dependencies of the seemingly largest a dependency, systemd ~190MB
that would be an extra layer size for each of the container images to be downloaded/fetched into registries.
Thanks, i tried to do the math of the reduction vs. inflation in sizes as follows. I think the crucial point here is the layering. If we do this image layering: |base| --> |+ service| --> |+ Puppet| we'd drop ~267 MB from base image, but we'd be installing that to the topmost level, per-component, right? In my basic deployment, undercloud seems to have 17 "components" (49 containers), overcloud controller 15 components (48 containers), and overcloud compute 4 components (7 containers). Accounting for overlaps, the total number of "components" used seems to be 19. (By "components" here i mean whatever uses a different ConfigImage than other services. I just eyeballed it but i think i'm not too far off the correct number.) So we'd subtract 267 MB from base image and add that to 19 leaf images used in this deployment. That means difference of +4.8 GB to the current image sizes. My /var/lib/registry dir on undercloud with all the images currently has 5.1 GB. We'd almost double that to 9.9 GB. Going from 5.1 to 9.9 GB seems like a lot of extra traffic for the CDNs (both external and e.g. internal within OpenStack Infra CI clouds). And for internal traffic between local registry and overcloud nodes, it gives +3.7 GB per controller and +800 MB per compute. That may not be so critical but still feels like a considerable downside. Another gut feeling is that this way of image layering would take longer time to build and to run the modify-image Ansible role which we use in CI, so that could endanger how our CI jobs fit into the time limit. We could also probably measure this but i'm not sure if it's worth spending the time. All in all i'd argue we should be looking at different options still.
Given that we should decouple systemd from all/some of the dependencies (an example topic for RDO [0]), that could save a 190MB. But it seems we cannot break the love of puppet and systemd as it heavily relies on the latter and changing packaging like that would higly likely affect baremetal deployments with puppet and systemd co-operating.
Ack :/
Long story short, we cannot shoot both rabbits with a single shot, not with puppet :) May be we could with ansible replacing puppet fully... So splitting config and runtime images is the only choice yet to address the raised security concerns. And let's forget about edge cases for now. Tossing around a pair of extra bytes over 40,000 WAN-distributed computes ain't gonna be our the biggest problem for sure.
[0] https://review.rdoproject.org/r/#/q/topic:base-container-reduction
Dan
Thanks
Jirka
__________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
__________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
On 29. 11. 18 20:20, Fox, Kevin M wrote:
If the base layers are shared, you won't pay extra for the separate puppet container
Yes, and that's the state we're in right now.
unless you have another container also installing ruby in an upper layer.
Not just Ruby but also Puppet and Systemd. I think that's what the proposal we're discussing here suggests -- removing this content from the base layer (so that we can get service runtime images without this content present) and putting this content *on top* of individual service images. Unless i'm missing some trick to start sharing *top* layers rather than *base* layers, i think that effectively disables the space sharing for the Ruby+Puppet+Systemd content.
With OpenStack, thats unlikely.
the apparent size of a container is not equal to its actual size.
Yes. :) Thanks Jirka
Thanks, Kevin ________________________________________ From: Jiří Stránský [jistr@redhat.com] Sent: Thursday, November 29, 2018 9:42 AM To: openstack-dev@lists.openstack.org Subject: Re: [openstack-dev] [TripleO][Edge] Reduce base layer of containers for security and size of images (maintenance) sakes
On 28. 11. 18 18:29, Bogdan Dobrelya wrote:
On 11/28/18 6:02 PM, Jiří Stránský wrote:
<snip>
Reiterating again on previous points:
-I'd be fine removing systemd. But lets do it properly and not via 'rpm -ev --nodeps'. -Puppet and Ruby *are* required for configuration. We can certainly put them in a separate container outside of the runtime service containers but doing so would actually cost you much more space/bandwidth for each service container. As both of these have to get downloaded to each node anyway in order to generate config files with our current mechanisms I'm not sure this buys you anything.
+1. I was actually under the impression that we concluded yesterday on IRC that this is the only thing that makes sense to seriously consider. But even then it's not a win-win -- we'd gain some security by leaner production images, but pay for it with space+bandwidth by duplicating image content (IOW we can help achieve one of the goals we had in mind by worsening the situation w/r/t the other goal we had in mind.)
Personally i'm not sold yet but it's something that i'd consider if we got measurements of how much more space/bandwidth usage this would consume, and if we got some further details/examples about how serious are the security concerns if we leave config mgmt tools in runtime images.
IIRC the other options (that were brought forward so far) were already dismissed in yesterday's IRC discussion and on the reviews. Bin/lib bind mounting being too hacky and fragile, and nsenter not really solving the problem (because it allows us to switch to having different bins/libs available, but it does not allow merging the availability of bins/libs from two containers into a single context).
We are going in circles here I think....
+1. I think too much of the discussion focuses on "why it's bad to have config tools in runtime images", but IMO we all sorta agree that it would be better not to have them there, if it came at no cost.
I think to move forward, it would be interesting to know: if we do this (i'll borrow Dan's drawing):
|base container| --> |service container| --> |service container w/ Puppet installed|
How much more space and bandwidth would this consume per node (e.g. separately per controller, per compute). This could help with decision making.
As I've already evaluated in the related bug, that is:
puppet-* modules and manifests ~ 16MB puppet with dependencies ~61MB dependencies of the seemingly largest a dependency, systemd ~190MB
that would be an extra layer size for each of the container images to be downloaded/fetched into registries.
Thanks, i tried to do the math of the reduction vs. inflation in sizes as follows. I think the crucial point here is the layering. If we do this image layering:
|base| --> |+ service| --> |+ Puppet|
we'd drop ~267 MB from base image, but we'd be installing that to the topmost level, per-component, right?
In my basic deployment, undercloud seems to have 17 "components" (49 containers), overcloud controller 15 components (48 containers), and overcloud compute 4 components (7 containers). Accounting for overlaps, the total number of "components" used seems to be 19. (By "components" here i mean whatever uses a different ConfigImage than other services. I just eyeballed it but i think i'm not too far off the correct number.)
So we'd subtract 267 MB from base image and add that to 19 leaf images used in this deployment. That means difference of +4.8 GB to the current image sizes. My /var/lib/registry dir on undercloud with all the images currently has 5.1 GB. We'd almost double that to 9.9 GB.
Going from 5.1 to 9.9 GB seems like a lot of extra traffic for the CDNs (both external and e.g. internal within OpenStack Infra CI clouds).
And for internal traffic between local registry and overcloud nodes, it gives +3.7 GB per controller and +800 MB per compute. That may not be so critical but still feels like a considerable downside.
Another gut feeling is that this way of image layering would take longer time to build and to run the modify-image Ansible role which we use in CI, so that could endanger how our CI jobs fit into the time limit. We could also probably measure this but i'm not sure if it's worth spending the time.
All in all i'd argue we should be looking at different options still.
Given that we should decouple systemd from all/some of the dependencies (an example topic for RDO [0]), that could save a 190MB. But it seems we cannot break the love of puppet and systemd as it heavily relies on the latter and changing packaging like that would higly likely affect baremetal deployments with puppet and systemd co-operating.
Ack :/
Long story short, we cannot shoot both rabbits with a single shot, not with puppet :) May be we could with ansible replacing puppet fully... So splitting config and runtime images is the only choice yet to address the raised security concerns. And let's forget about edge cases for now. Tossing around a pair of extra bytes over 40,000 WAN-distributed computes ain't gonna be our the biggest problem for sure.
[0] https://review.rdoproject.org/r/#/q/topic:base-container-reduction
Dan
Thanks
Jirka
__________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
__________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
__________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
__________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
On 11/29/18 6:42 PM, Jiří Stránský wrote:
On 28. 11. 18 18:29, Bogdan Dobrelya wrote:
On 11/28/18 6:02 PM, Jiří Stránský wrote:
<snip>
Reiterating again on previous points:
-I'd be fine removing systemd. But lets do it properly and not via 'rpm -ev --nodeps'. -Puppet and Ruby *are* required for configuration. We can certainly put them in a separate container outside of the runtime service containers but doing so would actually cost you much more space/bandwidth for each service container. As both of these have to get downloaded to each node anyway in order to generate config files with our current mechanisms I'm not sure this buys you anything.
+1. I was actually under the impression that we concluded yesterday on IRC that this is the only thing that makes sense to seriously consider. But even then it's not a win-win -- we'd gain some security by leaner production images, but pay for it with space+bandwidth by duplicating image content (IOW we can help achieve one of the goals we had in mind by worsening the situation w/r/t the other goal we had in mind.)
Personally i'm not sold yet but it's something that i'd consider if we got measurements of how much more space/bandwidth usage this would consume, and if we got some further details/examples about how serious are the security concerns if we leave config mgmt tools in runtime images.
IIRC the other options (that were brought forward so far) were already dismissed in yesterday's IRC discussion and on the reviews. Bin/lib bind mounting being too hacky and fragile, and nsenter not really solving the problem (because it allows us to switch to having different bins/libs available, but it does not allow merging the availability of bins/libs from two containers into a single context).
We are going in circles here I think....
+1. I think too much of the discussion focuses on "why it's bad to have config tools in runtime images", but IMO we all sorta agree that it would be better not to have them there, if it came at no cost.
I think to move forward, it would be interesting to know: if we do this (i'll borrow Dan's drawing):
|base container| --> |service container| --> |service container w/ Puppet installed|
How much more space and bandwidth would this consume per node (e.g. separately per controller, per compute). This could help with decision making.
As I've already evaluated in the related bug, that is:
puppet-* modules and manifests ~ 16MB puppet with dependencies ~61MB dependencies of the seemingly largest a dependency, systemd ~190MB
that would be an extra layer size for each of the container images to be downloaded/fetched into registries.
Thanks, i tried to do the math of the reduction vs. inflation in sizes as follows. I think the crucial point here is the layering. If we do this image layering:
|base| --> |+ service| --> |+ Puppet|
we'd drop ~267 MB from base image, but we'd be installing that to the topmost level, per-component, right?
Given we detached systemd from puppet, cronie et al, that would be 267-190MB, so the math below would be looking much better
In my basic deployment, undercloud seems to have 17 "components" (49 containers), overcloud controller 15 components (48 containers), and overcloud compute 4 components (7 containers). Accounting for overlaps, the total number of "components" used seems to be 19. (By "components" here i mean whatever uses a different ConfigImage than other services. I just eyeballed it but i think i'm not too far off the correct number.)
So we'd subtract 267 MB from base image and add that to 19 leaf images used in this deployment. That means difference of +4.8 GB to the current image sizes. My /var/lib/registry dir on undercloud with all the images currently has 5.1 GB. We'd almost double that to 9.9 GB.
Going from 5.1 to 9.9 GB seems like a lot of extra traffic for the CDNs (both external and e.g. internal within OpenStack Infra CI clouds).
And for internal traffic between local registry and overcloud nodes, it gives +3.7 GB per controller and +800 MB per compute. That may not be so critical but still feels like a considerable downside.
Another gut feeling is that this way of image layering would take longer time to build and to run the modify-image Ansible role which we use in CI, so that could endanger how our CI jobs fit into the time limit. We could also probably measure this but i'm not sure if it's worth spending the time.
All in all i'd argue we should be looking at different options still.
Given that we should decouple systemd from all/some of the dependencies (an example topic for RDO [0]), that could save a 190MB. But it seems we cannot break the love of puppet and systemd as it heavily relies on the latter and changing packaging like that would higly likely affect baremetal deployments with puppet and systemd co-operating.
Ack :/
Long story short, we cannot shoot both rabbits with a single shot, not with puppet :) May be we could with ansible replacing puppet fully... So splitting config and runtime images is the only choice yet to address the raised security concerns. And let's forget about edge cases for now. Tossing around a pair of extra bytes over 40,000 WAN-distributed computes ain't gonna be our the biggest problem for sure.
[0] https://review.rdoproject.org/r/#/q/topic:base-container-reduction
Dan
Thanks
Jirka
__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions) Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
__________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
-- Best regards, Bogdan Dobrelya, Irc #bogdando
On Fri, 2018-11-30 at 10:31 +0100, Bogdan Dobrelya wrote:
On 11/29/18 6:42 PM, Jiří Stránský wrote:
On 28. 11. 18 18:29, Bogdan Dobrelya wrote:
On 11/28/18 6:02 PM, Jiří Stránský wrote:
<snip>
Reiterating again on previous points:
-I'd be fine removing systemd. But lets do it properly and not via 'rpm -ev --nodeps'. -Puppet and Ruby *are* required for configuration. We can certainly put them in a separate container outside of the runtime service containers but doing so would actually cost you much more space/bandwidth for each service container. As both of these have to get downloaded to each node anyway in order to generate config files with our current mechanisms I'm not sure this buys you anything.
+1. I was actually under the impression that we concluded yesterday on IRC that this is the only thing that makes sense to seriously consider. But even then it's not a win-win -- we'd gain some security by leaner production images, but pay for it with space+bandwidth by duplicating image content (IOW we can help achieve one of the goals we had in mind by worsening the situation w/r/t the other goal we had in mind.)
Personally i'm not sold yet but it's something that i'd consider if we got measurements of how much more space/bandwidth usage this would consume, and if we got some further details/examples about how serious are the security concerns if we leave config mgmt tools in runtime images.
IIRC the other options (that were brought forward so far) were already dismissed in yesterday's IRC discussion and on the reviews. Bin/lib bind mounting being too hacky and fragile, and nsenter not really solving the problem (because it allows us to switch to having different bins/libs available, but it does not allow merging the availability of bins/libs from two containers into a single context).
We are going in circles here I think....
+1. I think too much of the discussion focuses on "why it's bad to have config tools in runtime images", but IMO we all sorta agree that it would be better not to have them there, if it came at no cost.
I think to move forward, it would be interesting to know: if we do this (i'll borrow Dan's drawing):
base container| --> |service container| --> |service container w/ Puppet installed|
How much more space and bandwidth would this consume per node (e.g. separately per controller, per compute). This could help with decision making.
As I've already evaluated in the related bug, that is:
puppet-* modules and manifests ~ 16MB puppet with dependencies ~61MB dependencies of the seemingly largest a dependency, systemd ~190MB
that would be an extra layer size for each of the container images to be downloaded/fetched into registries.
Thanks, i tried to do the math of the reduction vs. inflation in sizes as follows. I think the crucial point here is the layering. If we do this image layering:
base| --> |+ service| --> |+ Puppet|
we'd drop ~267 MB from base image, but we'd be installing that to the topmost level, per-component, right?
Given we detached systemd from puppet, cronie et al, that would be 267-190MB, so the math below would be looking much better
Would it be worth writing a spec that summarizes what action items are bing taken to optimize our base image with regards to the systemd? It seems like the general consenses is that cleaning up some of the RPM dependencies so that we don't install Systemd is the biggest win. What confuses me is why are there still patches posted to move Puppet out of the base layer when we agree moving it out of the base layer would actually cause our resulting container image set to be larger in size. Dan
In my basic deployment, undercloud seems to have 17 "components" (49 containers), overcloud controller 15 components (48 containers), and overcloud compute 4 components (7 containers). Accounting for overlaps, the total number of "components" used seems to be 19. (By "components" here i mean whatever uses a different ConfigImage than other services. I just eyeballed it but i think i'm not too far off the correct number.)
So we'd subtract 267 MB from base image and add that to 19 leaf images used in this deployment. That means difference of +4.8 GB to the current image sizes. My /var/lib/registry dir on undercloud with all the images currently has 5.1 GB. We'd almost double that to 9.9 GB.
Going from 5.1 to 9.9 GB seems like a lot of extra traffic for the CDNs (both external and e.g. internal within OpenStack Infra CI clouds).
And for internal traffic between local registry and overcloud nodes, it gives +3.7 GB per controller and +800 MB per compute. That may not be so critical but still feels like a considerable downside.
Another gut feeling is that this way of image layering would take longer time to build and to run the modify-image Ansible role which we use in CI, so that could endanger how our CI jobs fit into the time limit. We could also probably measure this but i'm not sure if it's worth spending the time.
All in all i'd argue we should be looking at different options still.
Given that we should decouple systemd from all/some of the dependencies (an example topic for RDO [0]), that could save a 190MB. But it seems we cannot break the love of puppet and systemd as it heavily relies on the latter and changing packaging like that would higly likely affect baremetal deployments with puppet and systemd co-operating.
Ack :/
Long story short, we cannot shoot both rabbits with a single shot, not with puppet :) May be we could with ansible replacing puppet fully... So splitting config and runtime images is the only choice yet to address the raised security concerns. And let's forget about edge cases for now. Tossing around a pair of extra bytes over 40,000 WAN-distributed computes ain't gonna be our the biggest problem for sure.
[0] https://review.rdoproject.org/r/#/q/topic:base-container-reduction
Dan
Thanks
Jirka
_______________________________________________________________ ___________
OpenStack Development Mailing List (not for usage questions) Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
___________________________________________________________________ _______ OpenStack Development Mailing List (not for usage questions) Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsu bscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
On 11/30/18 1:52 PM, Dan Prince wrote:
On Fri, 2018-11-30 at 10:31 +0100, Bogdan Dobrelya wrote:
On 11/29/18 6:42 PM, Jiří Stránský wrote:
On 28. 11. 18 18:29, Bogdan Dobrelya wrote:
On 11/28/18 6:02 PM, Jiří Stránský wrote:
<snip>
Reiterating again on previous points:
-I'd be fine removing systemd. But lets do it properly and not via 'rpm -ev --nodeps'. -Puppet and Ruby *are* required for configuration. We can certainly put them in a separate container outside of the runtime service containers but doing so would actually cost you much more space/bandwidth for each service container. As both of these have to get downloaded to each node anyway in order to generate config files with our current mechanisms I'm not sure this buys you anything.
+1. I was actually under the impression that we concluded yesterday on IRC that this is the only thing that makes sense to seriously consider. But even then it's not a win-win -- we'd gain some security by leaner production images, but pay for it with space+bandwidth by duplicating image content (IOW we can help achieve one of the goals we had in mind by worsening the situation w/r/t the other goal we had in mind.)
Personally i'm not sold yet but it's something that i'd consider if we got measurements of how much more space/bandwidth usage this would consume, and if we got some further details/examples about how serious are the security concerns if we leave config mgmt tools in runtime images.
IIRC the other options (that were brought forward so far) were already dismissed in yesterday's IRC discussion and on the reviews. Bin/lib bind mounting being too hacky and fragile, and nsenter not really solving the problem (because it allows us to switch to having different bins/libs available, but it does not allow merging the availability of bins/libs from two containers into a single context).
We are going in circles here I think....
+1. I think too much of the discussion focuses on "why it's bad to have config tools in runtime images", but IMO we all sorta agree that it would be better not to have them there, if it came at no cost.
I think to move forward, it would be interesting to know: if we do this (i'll borrow Dan's drawing):
base container| --> |service container| --> |service container w/ Puppet installed|
How much more space and bandwidth would this consume per node (e.g. separately per controller, per compute). This could help with decision making.
As I've already evaluated in the related bug, that is:
puppet-* modules and manifests ~ 16MB puppet with dependencies ~61MB dependencies of the seemingly largest a dependency, systemd ~190MB
that would be an extra layer size for each of the container images to be downloaded/fetched into registries.
Thanks, i tried to do the math of the reduction vs. inflation in sizes as follows. I think the crucial point here is the layering. If we do this image layering:
base| --> |+ service| --> |+ Puppet|
we'd drop ~267 MB from base image, but we'd be installing that to the topmost level, per-component, right?
Given we detached systemd from puppet, cronie et al, that would be 267-190MB, so the math below would be looking much better
Would it be worth writing a spec that summarizes what action items are bing taken to optimize our base image with regards to the systemd?
Perhaps it would be. But honestly, I see nothing biggie to require a full blown spec. Just changing RPM deps and layers for containers images. I'm tracking systemd changes here [0],[1],[2], btw (if accepted, it should be working as of fedora28(or 29) I hope) [0] https://review.rdoproject.org/r/#/q/topic:base-container-reduction [1] https://bugzilla.redhat.com/show_bug.cgi?id=1654659 [2] https://bugzilla.redhat.com/show_bug.cgi?id=1654672
It seems like the general consenses is that cleaning up some of the RPM dependencies so that we don't install Systemd is the biggest win.
What confuses me is why are there still patches posted to move Puppet out of the base layer when we agree moving it out of the base layer would actually cause our resulting container image set to be larger in size.
Dan
In my basic deployment, undercloud seems to have 17 "components" (49 containers), overcloud controller 15 components (48 containers), and overcloud compute 4 components (7 containers). Accounting for overlaps, the total number of "components" used seems to be 19. (By "components" here i mean whatever uses a different ConfigImage than other services. I just eyeballed it but i think i'm not too far off the correct number.)
So we'd subtract 267 MB from base image and add that to 19 leaf images used in this deployment. That means difference of +4.8 GB to the current image sizes. My /var/lib/registry dir on undercloud with all the images currently has 5.1 GB. We'd almost double that to 9.9 GB.
Going from 5.1 to 9.9 GB seems like a lot of extra traffic for the CDNs (both external and e.g. internal within OpenStack Infra CI clouds).
And for internal traffic between local registry and overcloud nodes, it gives +3.7 GB per controller and +800 MB per compute. That may not be so critical but still feels like a considerable downside.
Another gut feeling is that this way of image layering would take longer time to build and to run the modify-image Ansible role which we use in CI, so that could endanger how our CI jobs fit into the time limit. We could also probably measure this but i'm not sure if it's worth spending the time.
All in all i'd argue we should be looking at different options still.
Given that we should decouple systemd from all/some of the dependencies (an example topic for RDO [0]), that could save a 190MB. But it seems we cannot break the love of puppet and systemd as it heavily relies on the latter and changing packaging like that would higly likely affect baremetal deployments with puppet and systemd co-operating.
Ack :/
Long story short, we cannot shoot both rabbits with a single shot, not with puppet :) May be we could with ansible replacing puppet fully... So splitting config and runtime images is the only choice yet to address the raised security concerns. And let's forget about edge cases for now. Tossing around a pair of extra bytes over 40,000 WAN-distributed computes ain't gonna be our the biggest problem for sure.
[0] https://review.rdoproject.org/r/#/q/topic:base-container-reduction
Dan
Thanks
Jirka
_______________________________________________________________ ___________
OpenStack Development Mailing List (not for usage questions) Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
___________________________________________________________________ _______ OpenStack Development Mailing List (not for usage questions) Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsu bscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
-- Best regards, Bogdan Dobrelya, Irc #bogdando
Still confused by: [base] -> [service] -> [+ puppet] not: [base] -> [puppet] and [base] -> [service] ? Thanks, Kevin ________________________________________ From: Bogdan Dobrelya [bdobreli@redhat.com] Sent: Friday, November 30, 2018 5:31 AM To: Dan Prince; openstack-dev@lists.openstack.org; openstack-discuss@lists.openstack.org Subject: Re: [openstack-dev] [TripleO][Edge] Reduce base layer of containers for security and size of images (maintenance) sakes On 11/30/18 1:52 PM, Dan Prince wrote:
On Fri, 2018-11-30 at 10:31 +0100, Bogdan Dobrelya wrote:
On 11/29/18 6:42 PM, Jiří Stránský wrote:
On 28. 11. 18 18:29, Bogdan Dobrelya wrote:
On 11/28/18 6:02 PM, Jiří Stránský wrote:
<snip>
Reiterating again on previous points:
-I'd be fine removing systemd. But lets do it properly and not via 'rpm -ev --nodeps'. -Puppet and Ruby *are* required for configuration. We can certainly put them in a separate container outside of the runtime service containers but doing so would actually cost you much more space/bandwidth for each service container. As both of these have to get downloaded to each node anyway in order to generate config files with our current mechanisms I'm not sure this buys you anything.
+1. I was actually under the impression that we concluded yesterday on IRC that this is the only thing that makes sense to seriously consider. But even then it's not a win-win -- we'd gain some security by leaner production images, but pay for it with space+bandwidth by duplicating image content (IOW we can help achieve one of the goals we had in mind by worsening the situation w/r/t the other goal we had in mind.)
Personally i'm not sold yet but it's something that i'd consider if we got measurements of how much more space/bandwidth usage this would consume, and if we got some further details/examples about how serious are the security concerns if we leave config mgmt tools in runtime images.
IIRC the other options (that were brought forward so far) were already dismissed in yesterday's IRC discussion and on the reviews. Bin/lib bind mounting being too hacky and fragile, and nsenter not really solving the problem (because it allows us to switch to having different bins/libs available, but it does not allow merging the availability of bins/libs from two containers into a single context).
We are going in circles here I think....
+1. I think too much of the discussion focuses on "why it's bad to have config tools in runtime images", but IMO we all sorta agree that it would be better not to have them there, if it came at no cost.
I think to move forward, it would be interesting to know: if we do this (i'll borrow Dan's drawing):
base container| --> |service container| --> |service container w/ Puppet installed|
How much more space and bandwidth would this consume per node (e.g. separately per controller, per compute). This could help with decision making.
As I've already evaluated in the related bug, that is:
puppet-* modules and manifests ~ 16MB puppet with dependencies ~61MB dependencies of the seemingly largest a dependency, systemd ~190MB
that would be an extra layer size for each of the container images to be downloaded/fetched into registries.
Thanks, i tried to do the math of the reduction vs. inflation in sizes as follows. I think the crucial point here is the layering. If we do this image layering:
base| --> |+ service| --> |+ Puppet|
we'd drop ~267 MB from base image, but we'd be installing that to the topmost level, per-component, right?
Given we detached systemd from puppet, cronie et al, that would be 267-190MB, so the math below would be looking much better
Would it be worth writing a spec that summarizes what action items are bing taken to optimize our base image with regards to the systemd?
Perhaps it would be. But honestly, I see nothing biggie to require a full blown spec. Just changing RPM deps and layers for containers images. I'm tracking systemd changes here [0],[1],[2], btw (if accepted, it should be working as of fedora28(or 29) I hope) [0] https://review.rdoproject.org/r/#/q/topic:base-container-reduction [1] https://bugzilla.redhat.com/show_bug.cgi?id=1654659 [2] https://bugzilla.redhat.com/show_bug.cgi?id=1654672
It seems like the general consenses is that cleaning up some of the RPM dependencies so that we don't install Systemd is the biggest win.
What confuses me is why are there still patches posted to move Puppet out of the base layer when we agree moving it out of the base layer would actually cause our resulting container image set to be larger in size.
Dan
In my basic deployment, undercloud seems to have 17 "components" (49 containers), overcloud controller 15 components (48 containers), and overcloud compute 4 components (7 containers). Accounting for overlaps, the total number of "components" used seems to be 19. (By "components" here i mean whatever uses a different ConfigImage than other services. I just eyeballed it but i think i'm not too far off the correct number.)
So we'd subtract 267 MB from base image and add that to 19 leaf images used in this deployment. That means difference of +4.8 GB to the current image sizes. My /var/lib/registry dir on undercloud with all the images currently has 5.1 GB. We'd almost double that to 9.9 GB.
Going from 5.1 to 9.9 GB seems like a lot of extra traffic for the CDNs (both external and e.g. internal within OpenStack Infra CI clouds).
And for internal traffic between local registry and overcloud nodes, it gives +3.7 GB per controller and +800 MB per compute. That may not be so critical but still feels like a considerable downside.
Another gut feeling is that this way of image layering would take longer time to build and to run the modify-image Ansible role which we use in CI, so that could endanger how our CI jobs fit into the time limit. We could also probably measure this but i'm not sure if it's worth spending the time.
All in all i'd argue we should be looking at different options still.
Given that we should decouple systemd from all/some of the dependencies (an example topic for RDO [0]), that could save a 190MB. But it seems we cannot break the love of puppet and systemd as it heavily relies on the latter and changing packaging like that would higly likely affect baremetal deployments with puppet and systemd co-operating.
Ack :/
Long story short, we cannot shoot both rabbits with a single shot, not with puppet :) May be we could with ansible replacing puppet fully... So splitting config and runtime images is the only choice yet to address the raised security concerns. And let's forget about edge cases for now. Tossing around a pair of extra bytes over 40,000 WAN-distributed computes ain't gonna be our the biggest problem for sure.
[0] https://review.rdoproject.org/r/#/q/topic:base-container-reduction
Dan
Thanks
Jirka
_______________________________________________________________ ___________
OpenStack Development Mailing List (not for usage questions) Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
___________________________________________________________________ _______ OpenStack Development Mailing List (not for usage questions) Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsu bscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
-- Best regards, Bogdan Dobrelya, Irc #bogdando
Hi Kevin. Puppet not only creates config files but also executes a service dependent steps, like db sync, so neither '[base] -> [puppet]' nor '[base] -> [service]' would not be enough on its own. That requires some services specific code to be included into *config* images as well. PS. There is a related spec [0] created by Dan, please take a look and propose you feedback [0] https://review.openstack.org/620062 On 11/30/18 6:48 PM, Fox, Kevin M wrote:
Still confused by: [base] -> [service] -> [+ puppet] not: [base] -> [puppet] and [base] -> [service] ?
Thanks, Kevin ________________________________________ From: Bogdan Dobrelya [bdobreli@redhat.com] Sent: Friday, November 30, 2018 5:31 AM To: Dan Prince; openstack-dev@lists.openstack.org; openstack-discuss@lists.openstack.org Subject: Re: [openstack-dev] [TripleO][Edge] Reduce base layer of containers for security and size of images (maintenance) sakes
On 11/30/18 1:52 PM, Dan Prince wrote:
On Fri, 2018-11-30 at 10:31 +0100, Bogdan Dobrelya wrote:
On 11/29/18 6:42 PM, Jiří Stránský wrote:
On 28. 11. 18 18:29, Bogdan Dobrelya wrote:
On 11/28/18 6:02 PM, Jiří Stránský wrote:
<snip>
> Reiterating again on previous points: > > -I'd be fine removing systemd. But lets do it properly and > not via 'rpm > -ev --nodeps'. > -Puppet and Ruby *are* required for configuration. We can > certainly put > them in a separate container outside of the runtime service > containers > but doing so would actually cost you much more > space/bandwidth for each > service container. As both of these have to get downloaded to > each node > anyway in order to generate config files with our current > mechanisms > I'm not sure this buys you anything.
+1. I was actually under the impression that we concluded yesterday on IRC that this is the only thing that makes sense to seriously consider. But even then it's not a win-win -- we'd gain some security by leaner production images, but pay for it with space+bandwidth by duplicating image content (IOW we can help achieve one of the goals we had in mind by worsening the situation w/r/t the other goal we had in mind.)
Personally i'm not sold yet but it's something that i'd consider if we got measurements of how much more space/bandwidth usage this would consume, and if we got some further details/examples about how serious are the security concerns if we leave config mgmt tools in runtime images.
IIRC the other options (that were brought forward so far) were already dismissed in yesterday's IRC discussion and on the reviews. Bin/lib bind mounting being too hacky and fragile, and nsenter not really solving the problem (because it allows us to switch to having different bins/libs available, but it does not allow merging the availability of bins/libs from two containers into a single context).
> We are going in circles here I think....
+1. I think too much of the discussion focuses on "why it's bad to have config tools in runtime images", but IMO we all sorta agree that it would be better not to have them there, if it came at no cost.
I think to move forward, it would be interesting to know: if we do this (i'll borrow Dan's drawing):
> base container| --> |service container| --> |service > container w/ Puppet installed|
How much more space and bandwidth would this consume per node (e.g. separately per controller, per compute). This could help with decision making.
As I've already evaluated in the related bug, that is:
puppet-* modules and manifests ~ 16MB puppet with dependencies ~61MB dependencies of the seemingly largest a dependency, systemd ~190MB
that would be an extra layer size for each of the container images to be downloaded/fetched into registries.
Thanks, i tried to do the math of the reduction vs. inflation in sizes as follows. I think the crucial point here is the layering. If we do this image layering:
base| --> |+ service| --> |+ Puppet|
we'd drop ~267 MB from base image, but we'd be installing that to the topmost level, per-component, right?
Given we detached systemd from puppet, cronie et al, that would be 267-190MB, so the math below would be looking much better
Would it be worth writing a spec that summarizes what action items are bing taken to optimize our base image with regards to the systemd?
Perhaps it would be. But honestly, I see nothing biggie to require a full blown spec. Just changing RPM deps and layers for containers images. I'm tracking systemd changes here [0],[1],[2], btw (if accepted, it should be working as of fedora28(or 29) I hope)
[0] https://review.rdoproject.org/r/#/q/topic:base-container-reduction [1] https://bugzilla.redhat.com/show_bug.cgi?id=1654659 [2] https://bugzilla.redhat.com/show_bug.cgi?id=1654672
It seems like the general consenses is that cleaning up some of the RPM dependencies so that we don't install Systemd is the biggest win.
What confuses me is why are there still patches posted to move Puppet out of the base layer when we agree moving it out of the base layer would actually cause our resulting container image set to be larger in size.
Dan
In my basic deployment, undercloud seems to have 17 "components" (49 containers), overcloud controller 15 components (48 containers), and overcloud compute 4 components (7 containers). Accounting for overlaps, the total number of "components" used seems to be 19. (By "components" here i mean whatever uses a different ConfigImage than other services. I just eyeballed it but i think i'm not too far off the correct number.)
So we'd subtract 267 MB from base image and add that to 19 leaf images used in this deployment. That means difference of +4.8 GB to the current image sizes. My /var/lib/registry dir on undercloud with all the images currently has 5.1 GB. We'd almost double that to 9.9 GB.
Going from 5.1 to 9.9 GB seems like a lot of extra traffic for the CDNs (both external and e.g. internal within OpenStack Infra CI clouds).
And for internal traffic between local registry and overcloud nodes, it gives +3.7 GB per controller and +800 MB per compute. That may not be so critical but still feels like a considerable downside.
Another gut feeling is that this way of image layering would take longer time to build and to run the modify-image Ansible role which we use in CI, so that could endanger how our CI jobs fit into the time limit. We could also probably measure this but i'm not sure if it's worth spending the time.
All in all i'd argue we should be looking at different options still.
Given that we should decouple systemd from all/some of the dependencies (an example topic for RDO [0]), that could save a 190MB. But it seems we cannot break the love of puppet and systemd as it heavily relies on the latter and changing packaging like that would higly likely affect baremetal deployments with puppet and systemd co-operating.
Ack :/
Long story short, we cannot shoot both rabbits with a single shot, not with puppet :) May be we could with ansible replacing puppet fully... So splitting config and runtime images is the only choice yet to address the raised security concerns. And let's forget about edge cases for now. Tossing around a pair of extra bytes over 40,000 WAN-distributed computes ain't gonna be our the biggest problem for sure.
[0] https://review.rdoproject.org/r/#/q/topic:base-container-reduction
> Dan >
Thanks
Jirka
_______________________________________________________________ ___________
OpenStack Development Mailing List (not for usage questions) Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
___________________________________________________________________ _______ OpenStack Development Mailing List (not for usage questions) Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsu bscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
-- Best regards, Bogdan Dobrelya, Irc #bogdando
-- Best regards, Bogdan Dobrelya, Irc #bogdando
On 12/3/18 10:34 AM, Bogdan Dobrelya wrote:
Hi Kevin. Puppet not only creates config files but also executes a service dependent steps, like db sync, so neither '[base] -> [puppet]' nor '[base] -> [service]' would not be enough on its own. That requires some services specific code to be included into *config* images as well.
PS. There is a related spec [0] created by Dan, please take a look and propose you feedback
I'm terribly sorry, but that's a corrected link [0] to that spec. [0] https://review.openstack.org/620909
On 11/30/18 6:48 PM, Fox, Kevin M wrote:
Still confused by: [base] -> [service] -> [+ puppet] not: [base] -> [puppet] and [base] -> [service] ?
Thanks, Kevin ________________________________________ From: Bogdan Dobrelya [bdobreli@redhat.com] Sent: Friday, November 30, 2018 5:31 AM To: Dan Prince; openstack-dev@lists.openstack.org; openstack-discuss@lists.openstack.org Subject: Re: [openstack-dev] [TripleO][Edge] Reduce base layer of containers for security and size of images (maintenance) sakes
On 11/30/18 1:52 PM, Dan Prince wrote:
On Fri, 2018-11-30 at 10:31 +0100, Bogdan Dobrelya wrote:
On 11/29/18 6:42 PM, Jiří Stránský wrote:
On 28. 11. 18 18:29, Bogdan Dobrelya wrote:
On 11/28/18 6:02 PM, Jiří Stránský wrote: > <snip> > >> Reiterating again on previous points: >> >> -I'd be fine removing systemd. But lets do it properly and >> not via 'rpm >> -ev --nodeps'. >> -Puppet and Ruby *are* required for configuration. We can >> certainly put >> them in a separate container outside of the runtime service >> containers >> but doing so would actually cost you much more >> space/bandwidth for each >> service container. As both of these have to get downloaded to >> each node >> anyway in order to generate config files with our current >> mechanisms >> I'm not sure this buys you anything. > > +1. I was actually under the impression that we concluded > yesterday on > IRC that this is the only thing that makes sense to seriously > consider. > But even then it's not a win-win -- we'd gain some security by > leaner > production images, but pay for it with space+bandwidth by > duplicating > image content (IOW we can help achieve one of the goals we had > in mind > by worsening the situation w/r/t the other goal we had in > mind.) > > Personally i'm not sold yet but it's something that i'd > consider if we > got measurements of how much more space/bandwidth usage this > would > consume, and if we got some further details/examples about how > serious > are the security concerns if we leave config mgmt tools in > runtime > images. > > IIRC the other options (that were brought forward so far) were > already > dismissed in yesterday's IRC discussion and on the reviews. > Bin/lib bind > mounting being too hacky and fragile, and nsenter not really > solving the > problem (because it allows us to switch to having different > bins/libs > available, but it does not allow merging the availability of > bins/libs > from two containers into a single context). > >> We are going in circles here I think.... > > +1. I think too much of the discussion focuses on "why it's bad > to have > config tools in runtime images", but IMO we all sorta agree > that it > would be better not to have them there, if it came at no cost. > > I think to move forward, it would be interesting to know: if we > do this > (i'll borrow Dan's drawing): > >> base container| --> |service container| --> |service >> container w/ > Puppet installed| > > How much more space and bandwidth would this consume per node > (e.g. > separately per controller, per compute). This could help with > decision > making.
As I've already evaluated in the related bug, that is:
puppet-* modules and manifests ~ 16MB puppet with dependencies ~61MB dependencies of the seemingly largest a dependency, systemd ~190MB
that would be an extra layer size for each of the container images to be downloaded/fetched into registries.
Thanks, i tried to do the math of the reduction vs. inflation in sizes as follows. I think the crucial point here is the layering. If we do this image layering:
base| --> |+ service| --> |+ Puppet|
we'd drop ~267 MB from base image, but we'd be installing that to the topmost level, per-component, right?
Given we detached systemd from puppet, cronie et al, that would be 267-190MB, so the math below would be looking much better
Would it be worth writing a spec that summarizes what action items are bing taken to optimize our base image with regards to the systemd?
Perhaps it would be. But honestly, I see nothing biggie to require a full blown spec. Just changing RPM deps and layers for containers images. I'm tracking systemd changes here [0],[1],[2], btw (if accepted, it should be working as of fedora28(or 29) I hope)
[0] https://review.rdoproject.org/r/#/q/topic:base-container-reduction [1] https://bugzilla.redhat.com/show_bug.cgi?id=1654659 [2] https://bugzilla.redhat.com/show_bug.cgi?id=1654672
It seems like the general consenses is that cleaning up some of the RPM dependencies so that we don't install Systemd is the biggest win.
What confuses me is why are there still patches posted to move Puppet out of the base layer when we agree moving it out of the base layer would actually cause our resulting container image set to be larger in size.
Dan
In my basic deployment, undercloud seems to have 17 "components" (49 containers), overcloud controller 15 components (48 containers), and overcloud compute 4 components (7 containers). Accounting for overlaps, the total number of "components" used seems to be 19. (By "components" here i mean whatever uses a different ConfigImage than other services. I just eyeballed it but i think i'm not too far off the correct number.)
So we'd subtract 267 MB from base image and add that to 19 leaf images used in this deployment. That means difference of +4.8 GB to the current image sizes. My /var/lib/registry dir on undercloud with all the images currently has 5.1 GB. We'd almost double that to 9.9 GB.
Going from 5.1 to 9.9 GB seems like a lot of extra traffic for the CDNs (both external and e.g. internal within OpenStack Infra CI clouds).
And for internal traffic between local registry and overcloud nodes, it gives +3.7 GB per controller and +800 MB per compute. That may not be so critical but still feels like a considerable downside.
Another gut feeling is that this way of image layering would take longer time to build and to run the modify-image Ansible role which we use in CI, so that could endanger how our CI jobs fit into the time limit. We could also probably measure this but i'm not sure if it's worth spending the time.
All in all i'd argue we should be looking at different options still.
Given that we should decouple systemd from all/some of the dependencies (an example topic for RDO [0]), that could save a 190MB. But it seems we cannot break the love of puppet and systemd as it heavily relies on the latter and changing packaging like that would higly likely affect baremetal deployments with puppet and systemd co-operating.
Ack :/
Long story short, we cannot shoot both rabbits with a single shot, not with puppet :) May be we could with ansible replacing puppet fully... So splitting config and runtime images is the only choice yet to address the raised security concerns. And let's forget about edge cases for now. Tossing around a pair of extra bytes over 40,000 WAN-distributed computes ain't gonna be our the biggest problem for sure.
[0] https://review.rdoproject.org/r/#/q/topic:base-container-reduction
>> Dan >> > > Thanks > > Jirka > > _______________________________________________________________ > ___________ > > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: > OpenStack-dev-request@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
___________________________________________________________________ _______ OpenStack Development Mailing List (not for usage questions) Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsu bscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
-- Best regards, Bogdan Dobrelya, Irc #bogdando
-- Best regards, Bogdan Dobrelya, Irc #bogdando
participants (10)
-
Alex Schultz
-
Bogdan Dobrelya
-
Chris Dent
-
Dan Prince
-
Doug Hellmann
-
Fox, Kevin M
-
James Slagle
-
Jay Pipes
-
Jiří Stránský
-
Sean Mooney