[all][infra][kayobe][kolla] ping not permitted on latest centos-8-stream images
Hello, Late yesterday, I noticed many Kayobe CI jobs started failing with "ping: socket: Operation not permitted". I investigated the issue with clarkb on #openstack-infra, with help from #centos-devel as well (on Libera). This happens on the latest CentOS Stream 8 images and is caused by iputils 20180629-8.el8 removing capabilities on the ping binary [1]. This should have been shipped with a sysctl configuration allowing any group to access unprivileged ICMP echo sockets [2], but this is not in the systemd package yet. As a result, using ping without root privileges fails. TripleO is also impacted. They have fixed it in their CI jobs [3]. It is possible other projects are affected. There are multiple places within Kayobe and Kolla where we would need to set this sysctl to fix our CI, including backports to all supported branches. I was wondering if infra could instead customise their stream image or apply the sysctl in one of the common roles from zuul/zuul-jobs that are run at the beginning of each job? Many thanks. Best wishes, Pierre Riteau (priteau) [1] https://git.centos.org/rpms/iputils/c/efa64b5e05ccb2c1332304ad493acc874b61e1... [2] https://github.com/redhat-plumbers/systemd-rhel8/pull/246 [3] https://review.opendev.org/c/openstack/tripleo-ci/+/824635
On 2022-01-14 09:50:33 +0100 (+0100), Pierre Riteau wrote: [...]
There are multiple places within Kayobe and Kolla where we would need to set this sysctl to fix our CI, including backports to all supported branches. I was wondering if infra could instead customise their stream image or apply the sysctl in one of the common roles from zuul/zuul-jobs that are run at the beginning of each job? Many thanks. [...]
How close are the CentOS Stream maintainers from uploading a regression fix for the package? If it's going to be a while, then the safest solution is probably to add a platform-specific DIB element and rebuild our centos-8-stream images with that. Making modifications to our "base" job (or any of the roles it uses) is far more time consuming and likely to take a lot longer, because of the precautions we take in order to avoid accidentally breaking every job in the system (changes to the "base" job are not directly testable). -- Jeremy Stanley
On Fri, Jan 14, 2022 at 7:53 AM Jeremy Stanley <fungi@yuggoth.org> wrote:
On 2022-01-14 09:50:33 +0100 (+0100), Pierre Riteau wrote: [...]
There are multiple places within Kayobe and Kolla where we would need to set this sysctl to fix our CI, including backports to all supported branches. I was wondering if infra could instead customise their stream image or apply the sysctl in one of the common roles from zuul/zuul-jobs that are run at the beginning of each job? Many thanks. [...]
How close are the CentOS Stream maintainers from uploading a regression fix for the package? If it's going to be a while, then the safest solution is probably to add a platform-specific DIB element and rebuild our centos-8-stream images with that. Making modifications to our "base" job (or any of the roles it uses) is far more time consuming and likely to take a lot longer, because of the precautions we take in order to avoid accidentally breaking every job in the system (changes to the "base" job are not directly testable).
The fix is going into systemd[0]. I'm uncertain the time to hit the mirrors but there is a package already. In the meantime a workaround would be to apply the sysctl values like the tripleo-ci is doing. [0] https://bugzilla.redhat.com/show_bug.cgi?id=2037807
-- Jeremy Stanley
On 2022-01-14 08:13:25 -0700 (-0700), Alex Schultz wrote: [...]
The fix is going into systemd[0]. I'm uncertain the time to hit the mirrors but there is a package already. In the meantime a workaround would be to apply the sysctl values like the tripleo-ci is doing.
How do I determine from that what systemd package version number we're looking for? I can force mirror updates and image rebuilds far more quickly than any workarounds which require changing our image building recipes or central job configs, and with no need to spend time cleaning up the workarounds afterwards. -- Jeremy Stanley
On Fri, Jan 14, 2022, at 7:25 AM, Jeremy Stanley wrote:
On 2022-01-14 08:13:25 -0700 (-0700), Alex Schultz wrote: [...]
The fix is going into systemd[0]. I'm uncertain the time to hit the mirrors but there is a package already. In the meantime a workaround would be to apply the sysctl values like the tripleo-ci is doing.
How do I determine from that what systemd package version number we're looking for? I can force mirror updates and image rebuilds far more quickly than any workarounds which require changing our image building recipes or central job configs, and with no need to spend time cleaning up the workarounds afterwards.
I don't think any package update has been proposed to CentOS 8 Stream yet: https://git.centos.org/rpms/systemd We want systemd-239-55.el8 and no PR exists for that.
-- Jeremy Stanley
On Fri, Jan 14, 2022, at 6:50 AM, Jeremy Stanley wrote:
On 2022-01-14 09:50:33 +0100 (+0100), Pierre Riteau wrote: [...]
There are multiple places within Kayobe and Kolla where we would need to set this sysctl to fix our CI, including backports to all supported branches. I was wondering if infra could instead customise their stream image or apply the sysctl in one of the common roles from zuul/zuul-jobs that are run at the beginning of each job? Many thanks. [...]
How close are the CentOS Stream maintainers from uploading a regression fix for the package? If it's going to be a while, then the safest solution is probably to add a platform-specific DIB element and rebuild our centos-8-stream images with that. Making modifications to our "base" job (or any of the roles it uses) is far more time consuming and likely to take a lot longer, because of the precautions we take in order to avoid accidentally breaking every job in the system (changes to the "base" job are not directly testable). -- Jeremy Stanley
I don't think we should update DIB or our images to fix this. The distro is broken and our images accurately represent that state. If the software in CI fails as a result that is because our CI system is properly catching this problem. The software needs to work around this to ensure that it is deployable in the real world and not just on our systems. This approach of fixing it in the software itself appears to be the one TripleO took and is the correct approach.
On 2022-01-14 07:35:05 -0800 (-0800), Clark Boylan wrote: [...]
I don't think we should update DIB or our images to fix this. The distro is broken and our images accurately represent that state. If the software in CI fails as a result that is because our CI system is properly catching this problem. The software needs to work around this to ensure that it is deployable in the real world and not just on our systems.
This approach of fixing it in the software itself appears to be the one TripleO took and is the correct approach.
Thanks, in reflection I agree. It's good to keep reminding ourselves that what we're testing is that the software works on the target platform. Unfortunate and temporary as it may be, the current state of CentOS Stream 8 is that you need root privileges in order to use the ping utility. If we work around this in our testing, then users who are trying to deploy that software onto the current state of CentOS Stream 8 will not get the benefit of the workaround. It's good to be reminded that the goal is not to make tests pass no matter the cost, it's to make sure the software will work for its users. -- Jeremy Stanley
On Fri, 14 Jan 2022 at 16:57, Jeremy Stanley <fungi@yuggoth.org> wrote:
On 2022-01-14 07:35:05 -0800 (-0800), Clark Boylan wrote: [...]
I don't think we should update DIB or our images to fix this. The distro is broken and our images accurately represent that state. If the software in CI fails as a result that is because our CI system is properly catching this problem. The software needs to work around this to ensure that it is deployable in the real world and not just on our systems.
This approach of fixing it in the software itself appears to be the one TripleO took and is the correct approach.
Thanks, in reflection I agree. It's good to keep reminding ourselves that what we're testing is that the software works on the target platform. Unfortunate and temporary as it may be, the current state of CentOS Stream 8 is that you need root privileges in order to use the ping utility. If we work around this in our testing, then users who are trying to deploy that software onto the current state of CentOS Stream 8 will not get the benefit of the workaround.
It's good to be reminded that the goal is not to make tests pass no matter the cost, it's to make sure the software will work for its users. -- Jeremy Stanley
We have applied the workaround in Kolla Ansible and backported it to stable branches. A fixed systemd package is hopefully coming to CentOS Stream 8 soon, as it was imported in Git yesterday: https://git.centos.org/rpms/systemd/c/3d3dc89fb25868e8038ecac8d5aef0603bdfaa...
On Thu, Jan 20, 2022, at 3:24 AM, Pierre Riteau wrote:
On Fri, 14 Jan 2022 at 16:57, Jeremy Stanley <fungi@yuggoth.org> wrote:
On 2022-01-14 07:35:05 -0800 (-0800), Clark Boylan wrote: [...]
I don't think we should update DIB or our images to fix this. The distro is broken and our images accurately represent that state. If the software in CI fails as a result that is because our CI system is properly catching this problem. The software needs to work around this to ensure that it is deployable in the real world and not just on our systems.
This approach of fixing it in the software itself appears to be the one TripleO took and is the correct approach.
Thanks, in reflection I agree. It's good to keep reminding ourselves that what we're testing is that the software works on the target platform. Unfortunate and temporary as it may be, the current state of CentOS Stream 8 is that you need root privileges in order to use the ping utility. If we work around this in our testing, then users who are trying to deploy that software onto the current state of CentOS Stream 8 will not get the benefit of the workaround.
It's good to be reminded that the goal is not to make tests pass no matter the cost, it's to make sure the software will work for its users. -- Jeremy Stanley
We have applied the workaround in Kolla Ansible and backported it to stable branches.
A fixed systemd package is hopefully coming to CentOS Stream 8 soon, as it was imported in Git yesterday: https://git.centos.org/rpms/systemd/c/3d3dc89fb25868e8038ecac8d5aef0603bdfaa...
Looks like http://mirror.centos.org/centos/8-stream/BaseOS/x86_64/os/Packages/systemd-2... exists upstream of us now. Our mirrors haven't updated to pull that in yet but should soon. Then we will also need new centos 8 stream images built as systemd is included in them, and I'm not sure that systemd will get updated later. Once that happens you should be able to revert the various workarounds that have been made.
participants (4)
-
Alex Schultz
-
Clark Boylan
-
Jeremy Stanley
-
Pierre Riteau