[tripleo][ci] gate jobs killed / reset

Wesley Hayutin whayutin at redhat.com
Thu Sep 19 21:59:27 UTC 2019


On Wed, Sep 18, 2019 at 8:48 PM Emilien Macchi <emilien at redhat.com> wrote:

> Status:
>
> We have identified that the 2 major issues are:
>
> - Inflight validations taking too much time. They were enabled by default,
> we changed that:
> https://review.opendev.org/#/c/683001/
> https://review.opendev.org/#/c/682905/
> https://review.opendev.org/#/c/682943
> They are now disabled by default and also disabled in
> tripleo-ci-centos-7-containers-multinode
>
> - tripleo-container-image-prepare now takes 20 min instead of 10 min
> before, because of the re-authentication logic that was introduced a few
> weeks ago. It's proposed to be reverted now:
> https://review.opendev.org/#/c/682945/ as we haven't found another
> solution for now.
>
> We have restored the patches. You can now do recheck and approve to gate
> but please stay aware of the situation, by checking the IRC topic on
> #tripleo and monitoring the zuul queue: http://zuul.openstack.org/
>
> Thanks to infra for force-merging the patches we urgently needed;
> hopefully this stays exceptional and we don't face this situation again
> soon.
>
> We need to reduce the container image prepare to safely stay under the 3
> hours for tripleo-ci-centos-7-containers-multinode.
>
>
We're still not out of the woods yet.. the gate is still not back to where
it should be.
tripleo-ci-centos-7-containers-multinode is still running well over 3 hours
[1]

We're going to see if another container registry provides better
performance.

Thanks


[1]
http://dashboard-ci.tripleo.org/d/si1tipHZk/jobs-exploration?orgId=1&from=now-12h&to=now&var-influxdb_filter=job_name%7C%3D%7Ctripleo-ci-centos-7-containers-multinode&var-influxdb_filter=branch%7C%3D%7Cmaster



> On Wed, Sep 18, 2019 at 5:19 PM Wesley Hayutin <whayutin at redhat.com>
> wrote:
>
>>
>>
>> On Tue, Sep 17, 2019 at 4:40 PM Emilien Macchi <emilien at redhat.com>
>> wrote:
>>
>>> Note that I also cleared the check for tripleo projects to accelerate
>>> the testing of our potential fixes.
>>> Hopefully we can resolve the situation really soon.
>>>
>>> On Tue, Sep 17, 2019 at 4:29 PM Wesley Hayutin <whayutin at redhat.com>
>>> wrote:
>>>
>>>> Greetings,
>>>>
>>>> The zuul jobs in the TripleO gate queue were put out of their misery
>>>> approximately at 20:14 UTC Sept 17 2019.  The TripleO jobs were timing out
>>>> [1] and causing the gate queue to be delayed about 24 hours [2].
>>>>
>>>> We are hoping a revert [3] will restore TripleO jobs back to their
>>>> usual run times.  Please hold off on any rechecks or workflowing patches
>>>> until [3] is merged and the status on #tripleo is no longer "RED"
>>>>
>>>> We appreciate your patience while we work through this issue, the jobs
>>>> that were in the gate will be restored once we have confirmed and verified
>>>> the solution.
>>>>
>>>> Thank you
>>>>
>>>>
>>>> [1] https://bugs.launchpad.net/tripleo/+bug/1844446
>>>> [2]
>>>> http://dashboard-ci.tripleo.org/d/YRJtmtNWk/cockpit?orgId=1&fullscreen&panelId=398
>>>> [3] https://review.opendev.org/#/c/682729/
>>>>
>>>
>>>
>>> --
>>> Emilien Macchi
>>>
>>
>> Thanks for your continued patience re: the tripleo gate.
>>
>> We're currently waiting on a couple patches to land.
>> https://review.opendev.org/#/c/682905/
>> https://review.opendev.org/#/c/682731 or
>> https://review.opendev.org/#/c/682945/
>>
>> Also.. fyi, one can clearly see the performance regression here [1]
>>
>> [1]
>> http://dashboard-ci.tripleo.org/d/si1tipHZk/jobs-exploration?orgId=1&from=now-90d&to=now&fullscreen&panelId=16
>>
>>
>>
>>
>>
>>
>>
>
>
> --
> Emilien Macchi
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-discuss/attachments/20190919/cb17b448/attachment-0001.html>


More information about the openstack-discuss mailing list