[openstack-dev] [tripleo] tripleo gate is blocked - please read

Emilien Macchi emilien at redhat.com
Thu Jun 14 13:40:16 UTC 2018


It sounds like we merged a bunch last night thanks to the revert, so I went
ahead and restored/rechecked everything that was out of the gate. I've
checked and nothing was left over, but let me know in case I missed
something.
I'll keep updating this thread with the progress made to improve the
situation etc.
So from now, situation is back to "normal", recheck/+W is ok.

Thanks again for your patience,

On Wed, Jun 13, 2018 at 10:39 PM, Emilien Macchi <emilien at redhat.com> wrote:

> https://review.openstack.org/575264 just landed (and didn't timeout in
> check nor gate without recheck, so good sigh it helped to mitigate).
>
> I've restore and rechecked some patches that I evacuated from the gate,
> please do not restore others or recheck or approve anything for now, and
> see how it goes with a few patches.
> We're still working with Steve on his patches to optimize the way we
> deploy containers on the registry and are investigating how we could make
> it faster with a proxy.
>
> Stay tuned and thanks for your patience.
>
> On Wed, Jun 13, 2018 at 5:50 PM, Emilien Macchi <emilien at redhat.com>
> wrote:
>
>> TL;DR: gate queue was 25h+, we put all patches from gate on standby, do
>> not restore/recheck until further announcement.
>>
>> We recently enabled the containerized undercloud for multinode jobs and
>> we believe this was a bit premature as the container download process
>> wasn't optimized so it's not pulling the mirrors for the same containers
>> multiple times yet.
>> It caused the job runtime to increase and probably the load on docker.io
>> mirrors hosted by OpenStack Infra to be a bit slower to provide the same
>> containers multiple times. The time taken to prepare containers on the
>> undercloud and then for the overcloud caused the jobs to randomly timeout
>> therefore the gate to fail in a high amount of times, so we decided to
>> remove all jobs from the gate by abandoning the patches temporarily (I have
>> them in my browser and will restore when things are stable again, please do
>> not touch anything).
>>
>> Steve Baker has been working on a series of patches that optimize the way
>> we prepare the containers but basically the workflow will be:
>> - pull containers needed for the undercloud into a local registry, using
>> infra mirror if available
>> - deploy the containerized undercloud
>> - pull containers needed for the overcloud minus the ones already pulled
>> for the undercloud, using infra mirror if available
>> - update containers on the overcloud
>> - deploy the containerized undercloud
>>
>> With that process, we hope to reduce the runtime of the deployment and
>> therefore reduce the timeouts in the gate.
>> To enable it, we need to land in that order: https://review.openstac
>> k.org/#/c/571613/, https://review.openstack.org/#/c/574485/,
>> https://review.openstack.org/#/c/571631/ and https://review.openstack.o
>> rg/#/c/568403.
>>
>> In the meantime, we are disabling the containerized undercloud recently
>> enabled on all scenarios: https://review.openstack.org/#/c/575264/ for
>> mitigation with the hope to stabilize things until Steve's patches land.
>> Hopefully, we can merge Steve's work tonight/tomorrow and re-enable the
>> containerized undercloud on scenarios after checking that we don't have
>> timeouts and reasonable deployment runtimes.
>>
>> That's the plan we came with, if you have any question / feedback please
>> share it.
>> --
>> Emilien, Steve and Wes
>>
>
>
>
> --
> Emilien Macchi
>



-- 
Emilien Macchi
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20180614/9e9be0a0/attachment.html>


More information about the OpenStack-dev mailing list