[tripleo][ci] gate jobs killed / reset

Emilien Macchi emilien at redhat.com
Thu Sep 19 02:48:09 UTC 2019


Status:

We have identified that the 2 major issues are:

- Inflight validations taking too much time. They were enabled by default,
we changed that:
https://review.opendev.org/#/c/683001/
https://review.opendev.org/#/c/682905/
https://review.opendev.org/#/c/682943
They are now disabled by default and also disabled in
tripleo-ci-centos-7-containers-multinode

- tripleo-container-image-prepare now takes 20 min instead of 10 min
before, because of the re-authentication logic that was introduced a few
weeks ago. It's proposed to be reverted now:
https://review.opendev.org/#/c/682945/ as we haven't found another solution
for now.

We have restored the patches. You can now do recheck and approve to gate
but please stay aware of the situation, by checking the IRC topic on
#tripleo and monitoring the zuul queue: http://zuul.openstack.org/

Thanks to infra for force-merging the patches we urgently needed; hopefully
this stays exceptional and we don't face this situation again soon.

We need to reduce the container image prepare to safely stay under the 3
hours for tripleo-ci-centos-7-containers-multinode.

On Wed, Sep 18, 2019 at 5:19 PM Wesley Hayutin <whayutin at redhat.com> wrote:

>
>
> On Tue, Sep 17, 2019 at 4:40 PM Emilien Macchi <emilien at redhat.com> wrote:
>
>> Note that I also cleared the check for tripleo projects to accelerate the
>> testing of our potential fixes.
>> Hopefully we can resolve the situation really soon.
>>
>> On Tue, Sep 17, 2019 at 4:29 PM Wesley Hayutin <whayutin at redhat.com>
>> wrote:
>>
>>> Greetings,
>>>
>>> The zuul jobs in the TripleO gate queue were put out of their misery
>>> approximately at 20:14 UTC Sept 17 2019.  The TripleO jobs were timing out
>>> [1] and causing the gate queue to be delayed about 24 hours [2].
>>>
>>> We are hoping a revert [3] will restore TripleO jobs back to their usual
>>> run times.  Please hold off on any rechecks or workflowing patches until
>>> [3] is merged and the status on #tripleo is no longer "RED"
>>>
>>> We appreciate your patience while we work through this issue, the jobs
>>> that were in the gate will be restored once we have confirmed and verified
>>> the solution.
>>>
>>> Thank you
>>>
>>>
>>> [1] https://bugs.launchpad.net/tripleo/+bug/1844446
>>> [2]
>>> http://dashboard-ci.tripleo.org/d/YRJtmtNWk/cockpit?orgId=1&fullscreen&panelId=398
>>> [3] https://review.opendev.org/#/c/682729/
>>>
>>
>>
>> --
>> Emilien Macchi
>>
>
> Thanks for your continued patience re: the tripleo gate.
>
> We're currently waiting on a couple patches to land.
> https://review.opendev.org/#/c/682905/
> https://review.opendev.org/#/c/682731 or
> https://review.opendev.org/#/c/682945/
>
> Also.. fyi, one can clearly see the performance regression here [1]
>
> [1]
> http://dashboard-ci.tripleo.org/d/si1tipHZk/jobs-exploration?orgId=1&from=now-90d&to=now&fullscreen&panelId=16
>
>
>
>
>
>
>


-- 
Emilien Macchi
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-discuss/attachments/20190918/cf937e78/attachment.html>


More information about the openstack-discuss mailing list