[openstack-dev] [tripleo] tripleo upstream gate outtage, was: -> gate jobs impacted RAX yum mirror

Wesley Hayutin whayutin at redhat.com
Sat May 12 16:10:42 UTC 2018


On Wed, May 9, 2018 at 10:43 PM Wesley Hayutin <whayutin at redhat.com> wrote:

> FYI.. https://bugs.launchpad.net/tripleo/+bug/1770298
>
> I'm on #openstack-infra chatting w/ Ian atm.
> Thanks
>
>
Greetings,

I wanted to update  everyone on the status of the upstream tripleo check
and gate jobs.
There have been a series of infra related issues that caused the upstream
tripleo gates to go red.

1. The first issue hit was
https://bugs.launchpad.net/tripleo/+bug/1770298 which
caused package install errors
2. Shortly after #1 was resolved CentOS released 7.5 which comes directly
into the upstream repos untested and ungated.  Additionally the associated
qcow2 image and container-base images were not updated at the same time as
the yum repos.  https://bugs.launchpad.net/tripleo/+bug/1770355
3.  Related to #2 the container and bm image rpms were not in sync causing
https://bugs.launchpad.net/tripleo/+bug/1770692
4. Building the bm images was failing due to an open issue with the centos
kernel, thanks to Yatin and Alfredo for
https://review.rdoproject.org/r/#/c/13737/
5. To ensure the containers are updated to the latest rpms at build time,
we have the following patch from Alex
https://review.openstack.org/#/c/567636/.
6.  I also noticed that we are building the centos-base container in our
container build jobs, however it is not pushed out to the container
registeries because it is not included in the tripleo-common repo
<https://github.com/openstack/tripleo-common/blob/master/container-images/overcloud_containers.yaml.j2>
I would like to discuss this with some of the folks working on containers.
If we had an updated centos-base container I think some of these issues
would have been prevented.

The above issues were resolved, and the master promotion jobs all had
passed.  Thanks to all who were involved!

Once the promotion jobs pass and report status to the dlrn_api, a promotion
was triggered automatically to upload the promoted images, containers, and
updated dlrn hash.  This failed due to network latency in the tenant where
the tripleo-ci infra is hosted.  The issue is tracked here
https://bugs.launchpad.net/tripleo/+bug/1770860

Matt Young and myself worked well into the evening on Friday to diagnose
the issue and ended up having to execute the image, container and dlrn_hash
promotion outside of our tripleo-infra tenant.  Thanks to Matt for his
effort.

At the moment I have updated the ci status in #tripleo, the master check
and gate jobs are green in the upstream which should unblock merging most
patches.  The status of stable branches and third party ci is still being
investigated.

Automatic promotions are blocked until the network issues in the
tripleo-infra tenant are resolved.  The bug is marked with alert in
#tripleo.  Please see #tripleo for future status updates.

Thanks all
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20180512/079202bf/attachment.html>


More information about the OpenStack-dev mailing list