[openstack-dev] [tripleo] tripleo upstream gate outtage, was: -> gate jobs impacted RAX yum mirror

Matt Young myoung at redhat.com
Sun May 13 20:09:46 UTC 2018


Re: resolving network latency issue on the promotion server in
tripleo-infra tenant, that's great news!

Re: retrospective on this class of issue, I'll reach out directly early
this week to get something on the calendar for our two teams.  We clearly
need to brainstorm/hash out together how we can reduce the turbulence
moving forward.

In addition, as a result of working these issues over the past few days
we've identified a few pieces of low hanging (tooling) fruit that are ripe
for for improvements that will speed diagnosis / debug in the future.
We'll capture these as RFE's and get them into our backlog.

Matt

On Sun, May 13, 2018 at 10:25 AM, Wesley Hayutin <whayutin at redhat.com>
wrote:

>
>
> On Sat, May 12, 2018 at 11:45 PM Emilien Macchi <emilien at redhat.com>
> wrote:
>
>> On Sat, May 12, 2018 at 9:10 AM, Wesley Hayutin <whayutin at redhat.com>
>> wrote:
>>>
>>> 2. Shortly after #1 was resolved CentOS released 7.5 which comes
>>> directly into the upstream repos untested and ungated.  Additionally the
>>> associated qcow2 image and container-base images were not updated at the
>>> same time as the yum repos.  https://bugs.launchpad.net/tripleo/+bug/
>>> 1770355
>>>
>>
>> Why do we have this situation everytime the OS is upgraded to a major
>> version? Can't we test the image before actually using it? We could have
>> experimental jobs testing latest image and pin gate images to a specific
>> one?
>> Like we could configure infra to deploy centos 7.4 in our gate and 7.5 in
>> experimental, so we can take our time to fix eventual problems and make the
>> switch when we're ready, instead of dealing with fires (that usually come
>> all together).
>>
>> It would be great to make a retrospective on this thing between tripleo
>> ci & infra folks, and see how we can improve things.
>>
>
> I agree,
> We need to in coordination with the infra team be able to pin / lock
> content for production check and gate jobs while also have the ability to
> stage new content e.g. centos 7.5 with experimental or periodic jobs.
> In this particular case the ci team did check the tripleo deployment w/
> centos 7.5 updates, however we did not stage or test what impact the centos
> minor update would have on the upstream job workflow.
> The key issue is that the base centos image used upstream can not be
> pinned by the ci team, if say we could pin that image the ci team could pin
> the centos repos used in ci and run staging jobs on the latest centos
> content.
>
> I'm glad that you also see the need for some amount of coordination here,
> I've been in contact with a few folks to initiate the conversation.
>
> In an unrelated note, Sagi and I just fixed the network latency issue on
> our promotion server, it was related to DNS.  Automatic promotions should
> be back online.
> Thanks all.
>
>
>> --
>> Emilien Macchi
>> ____________________________________________________________
>> ______________
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:
>> unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20180513/b5b54b75/attachment.html>


More information about the OpenStack-dev mailing list