[openstack-dev] [tripleo] tripleo gate is blocked - please read

Emilien Macchi emilien at redhat.com
Thu Jun 14 00:50:13 UTC 2018


TL;DR: gate queue was 25h+, we put all patches from gate on standby, do not
restore/recheck until further announcement.

We recently enabled the containerized undercloud for multinode jobs and we
believe this was a bit premature as the container download process wasn't
optimized so it's not pulling the mirrors for the same containers multiple
times yet.
It caused the job runtime to increase and probably the load on docker.io
mirrors hosted by OpenStack Infra to be a bit slower to provide the same
containers multiple times. The time taken to prepare containers on the
undercloud and then for the overcloud caused the jobs to randomly timeout
therefore the gate to fail in a high amount of times, so we decided to
remove all jobs from the gate by abandoning the patches temporarily (I have
them in my browser and will restore when things are stable again, please do
not touch anything).

Steve Baker has been working on a series of patches that optimize the way
we prepare the containers but basically the workflow will be:
- pull containers needed for the undercloud into a local registry, using
infra mirror if available
- deploy the containerized undercloud
- pull containers needed for the overcloud minus the ones already pulled
for the undercloud, using infra mirror if available
- update containers on the overcloud
- deploy the containerized undercloud

With that process, we hope to reduce the runtime of the deployment and
therefore reduce the timeouts in the gate.
To enable it, we need to land in that order:
https://review.openstack.org/#/c/571613/,
https://review.openstack.org/#/c/574485/,
https://review.openstack.org/#/c/571631/ and
https://review.openstack.org/#/c/568403.

In the meantime, we are disabling the containerized undercloud recently
enabled on all scenarios: https://review.openstack.org/#/c/575264/ for
mitigation with the hope to stabilize things until Steve's patches land.
Hopefully, we can merge Steve's work tonight/tomorrow and re-enable the
containerized undercloud on scenarios after checking that we don't have
timeouts and reasonable deployment runtimes.

That's the plan we came with, if you have any question / feedback please
share it.
-- 
Emilien, Steve and Wes
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20180613/fc0d10fe/attachment.html>


More information about the OpenStack-dev mailing list