[openstack-dev] [tripleo] tripleo upstream gate outtage, was: -> gate jobs impacted RAX yum mirror

Jeremy Stanley fungi at yuggoth.org
Mon May 14 14:35:24 UTC 2018

On 2018-05-14 07:07:03 -0600 (-0600), Wesley Hayutin wrote:
> I think you may be conflating the notion that ubuntu or rhel/cent
> can be updated w/o any issues to applications that run atop of the
> distributions with what it means to introduce a minor update into
> the upstream openstack ci workflow.
> If jobs could execute w/o a timeout the tripleo jobs would have
> not gone red.  Since we do have constraints in the upstream like a
> timeouts and others we have to prepare containers, images etc to
> work efficiently in the upstream.  For example, if our jobs had
> the time to yum update the roughly 120 containers in play in each
> job the tripleo jobs would have just worked.  I am not advocating
> for not having timeouts or constraints on jobs, however I am
> saying this is an infra issue, not a distribution or distribution
> support issue.
> I think this is an important point to consider and I view it as
> mostly unrelated to the support claims by the distribution.  Does
> that make sense?

Thanks, the thread jumped straight to suggesting costly fixes
(separate images for each CentOS point release, adding an evaluation
period or acceptance testing for new point releases, et cetera)
without coming anywhere close to exploring the problem space. Is
your only concern that when your jobs started using CentOS 7.5
instead of 7.4 they took longer to run? What was the root cause? Are
you saying your jobs consume externally-produced artifacts which lag
behind CentOS package updates? Couldn't a significant burst of new
packages cause the same symptoms even without it being tied to a
minor version increase?

This _doesn't_ sound to me like a problem with how we've designed
our infrastructure, unless there are additional details you're
omitting. It sounds like a problem with how the jobs are designed
and expectations around distros slowly trickling package updates
into the series without occasional larger bursts of package deltas.
I'd like to understand more about why you upgrade packages inside
your externally-produced container images at job runtime at all,
rather than relying on the package versions baked into them. It
seems like you're arguing that the existence of lots of new package
versions which aren't already in your container images is the
problem, in which case I have trouble with the rationalization of it
being "an infra issue" insofar as it requires changes to the
services as provided by the OpenStack Infra team.

Just to be clear, we didn't "introduce a minor update into the
upstream openstack ci workflow." We continuously pull CentOS 7
packages into our package mirrors, and continuously rebuild our
centos-7 images from whatever packages the distro says are current.
Our automation doesn't know that there's a difference between
packages which were part of CentOS 7.4 and 7.5 any more than it
knows that there's a difference between Ubuntu 16.04.2 and 16.04.3.
Even if we somehow managed to pause our CentOS image updates
immediately prior to 7.5, jobs would still try to upgrade those
7.4-based images to the 7.5 packages in our mirror, right?
Jeremy Stanley
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 963 bytes
Desc: not available
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20180514/a51d2d69/attachment.sig>

More information about the OpenStack-dev mailing list