[tripleo] Stop using host's /run|/var/run inside containers

Cédric Jeanneret cjeanner at redhat.com
Tue Jun 23 09:50:16 UTC 2020



On 6/19/20 2:48 PM, Sofer Athlan-Guyot wrote:
> Hi,
> 
> not really a reply, but some random command as the title picked my
> curiousity which might gives more context.
> 
> Cédric Jeanneret <cjeanner at redhat.com> writes:
> 
>> On 6/18/20 9:42 AM, Cédric Jeanneret wrote:
>>> Hello all!
>>>
>>> While working on podman integration, especially the SELinux part of it,
>>> I was wondering why we kept using the host's /run (or its replicated
>>> /var/run) location inside containers. And I'm still wondering, 2 years
>>> later ;).
>>>
>>> Reasons:
>>> - from time to time, there are patches adding a ":z" flag to the run
>>> bind-mount. This breaks the system, since the host systemd can't
>>> write/access container_file_t SELinux context. Doing a relabeling might
>>> therefore prevent a service restart.
>>>
>>> - in order to keep things in a clean, understandable tree, getting a
>>> dedicated shared directory for the container's sockets makes sense, as
>>> it might make things easier to check (for instance, "is this or that
>>> service running in a container?")
>>>
>>> - if an operator runs a restorecon during runtime, it will break
>>> container services
>>>
>>> - mounting /run directly in the containers might expose unwanted
>>> sockets, such as DBus (this creates SELinux denials, and we're
>>> monkey-patching things and doing really ugly changes to prevent it).
>>> It's more than probable other unwanted shared sockets end in the
>>> containers, and it might expose the host at some point. Here again, from
>>> time to time we see new SELinux policies being added in order to solve
>>> the denials, and it creates big holes in the host security
>>>
>>> AFAIK, no *host* service is accessed by any container services, right?
>>> If so, could we imagine moving the shared /run to some other location on
>>> the host, such as /run/containers, or /container-run, or any other
>>> *dedicated* location we can manage as we want on a SELinux context?
>>
>> Small addendum/errata:
>>
>> some containers DO need to access some specific sockets/directories in
>> /run, such as /run/netns and, probably, /run/openvswitch (iirc this one
>> isn't running in a container).
>> For those specific cases, we can of course mount the specific locations
>> inside the container's /run.
>>
>> This addendum doesn't change the main question though :)
>>
> 
> So I run that command on controller and compute (train ... sorry old
> version, but the command stands) out of curiousity.
> 
> Get all the containers that mounts run:
> 
> for i in $(podman ps --format '{{.Names}}') ; do echo $i; podman inspect $i | jq '.[]|.Mounts[]|.Source + " -> " + .Destination'; done | awk '/^[a-z]/{container=$1}/run/{print container " : " $0}'
> 
> # controller:
> 
> swift_proxy : "/run -> /run"
> ceph-mgr-controller-0 : "/var/run/ceph -> /var/run/ceph"
> ceph-mon-controller-0 : "/var/run/ceph -> /var/run/ceph"
> openstack-cinder-backup-podman-0 : "/run -> /run"
> ovn_controller : "/run -> /run"
> ovn_controller : "/var/lib/openvswitch/ovn -> /run/ovn"
> nova_scheduler : "/run -> /run"
> iscsid : "/run -> /run"
> ovn-dbs-bundle-podman-0 : "/var/lib/openvswitch/ovn -> /run/openvswitch"
> ovn-dbs-bundle-podman-0 : "/var/lib/openvswitch/ovn -> /run/ovn"
> redis-bundle-podman-0 : "/var/run/redis -> /var/run/redis"
> 
> # compute
> nova_compute : "/run -> /run"
> ovn_metadata_agent : "/run/netns -> /run/netns"
> ovn_metadata_agent : "/run/openvswitch -> /run/openvswitch"
> ovn_controller : "/run -> /run"
> ovn_controller : "/var/lib/openvswitch/ovn -> /run/ovn"
> nova_migration_target : "/run/libvirt -> /run/libvirt"
> iscsid : "/run -> /run"
> nova_libvirt : "/run -> /run"
> nova_libvirt : "/var/run/libvirt -> /var/run/libvirt"
> nova_virtlogd : "/run -> /run"
> nova_virtlogd : "/var/run/libvirt -> /var/run/libvirt"
> neutron-haproxy-ovnmeta-a80e1d01-9c65-4fd3-8393-0bf5b66d175e : "/run/netns -> /run/netns"
> 
> So the usual suspects in this particular example seems to be
> cinder-backup, iscsid, ceph, swift, redis.
> 
> Openvswitch seems to do the right thing here.
> 
> I guess that the nova one must be required somehow.

probably not anymore, since even libvirt is running in a container (and
this created its own issues).

I'm pretty sure most of the services can be running without the host
/run location. Would be worth some testing imho :).

Cheers,

C.

> 
>>
>>>
>>> I would therefore get some feedback about this proposed change.
>>>
>>> For the containers, nothing should change:
>>> - they will get their /run populated with other containers sockets
>>> - they will NOT be able to access the host services at all.
>>>
>>> Thank you for your feedback, ideas and thoughts!
>>>
>>> Cheers,
>>>
>>> C.
>>>
>>
>> -- 
>> Cédric Jeanneret (He/Him/His)
>> Sr. Software Engineer - OpenStack Platform
>> Deployment Framework TC
>> Red Hat EMEA
>> https://www.redhat.com/
>>

-- 
Cédric Jeanneret (He/Him/His)
Sr. Software Engineer - OpenStack Platform
Deployment Framework TC
Red Hat EMEA
https://www.redhat.com/

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <http://lists.openstack.org/pipermail/openstack-discuss/attachments/20200623/7d8a9347/attachment-0001.sig>


More information about the openstack-discuss mailing list