On Wed, Sep 18, 2019 at 8:48 PM Emilien Macchi <emilien@redhat.com> wrote:
Status:

We have identified that the 2 major issues are:

- Inflight validations taking too much time. They were enabled by default, we changed that:
https://review.opendev.org/#/c/683001/
https://review.opendev.org/#/c/682905/
https://review.opendev.org/#/c/682943
They are now disabled by default and also disabled in tripleo-ci-centos-7-containers-multinode

- tripleo-container-image-prepare now takes 20 min instead of 10 min before, because of the re-authentication logic that was introduced a few weeks ago. It's proposed to be reverted now: https://review.opendev.org/#/c/682945/ as we haven't found another solution for now.

We have restored the patches. You can now do recheck and approve to gate but please stay aware of the situation, by checking the IRC topic on #tripleo and monitoring the zuul queue: http://zuul.openstack.org/

Thanks to infra for force-merging the patches we urgently needed; hopefully this stays exceptional and we don't face this situation again soon.

We need to reduce the container image prepare to safely stay under the 3 hours for tripleo-ci-centos-7-containers-multinode.


We're still not out of the woods yet.. the gate is still not back to where it should be.
tripleo-ci-centos-7-containers-multinode is still running well over 3 hours [1]

We're going to see if another container registry provides better performance.

Thanks


[1] http://dashboard-ci.tripleo.org/d/si1tipHZk/jobs-exploration?orgId=1&from=now-12h&to=now&var-influxdb_filter=job_name%7C%3D%7Ctripleo-ci-centos-7-containers-multinode&var-influxdb_filter=branch%7C%3D%7Cmaster

 
On Wed, Sep 18, 2019 at 5:19 PM Wesley Hayutin <whayutin@redhat.com> wrote:


On Tue, Sep 17, 2019 at 4:40 PM Emilien Macchi <emilien@redhat.com> wrote:
Note that I also cleared the check for tripleo projects to accelerate the testing of our potential fixes.
Hopefully we can resolve the situation really soon.

On Tue, Sep 17, 2019 at 4:29 PM Wesley Hayutin <whayutin@redhat.com> wrote:
Greetings,

The zuul jobs in the TripleO gate queue were put out of their misery approximately at 20:14 UTC Sept 17 2019.  The TripleO jobs were timing out [1] and causing the gate queue to be delayed about 24 hours [2].   

We are hoping a revert [3] will restore TripleO jobs back to their usual run times.  Please hold off on any rechecks or workflowing patches until [3] is merged and the status on #tripleo is no longer "RED"

We appreciate your patience while we work through this issue, the jobs that were in the gate will be restored once we have confirmed and verified the solution.

Thank you




--
Emilien Macchi

Thanks for your continued patience re: the tripleo gate.

We're currently waiting on a couple patches to land.

Also.. fyi, one can clearly see the performance regression here [1]






 


--
Emilien Macchi