[openstack-dev] [tripleo] critical situation with CI / upgrade jobs

Jiří Stránský jistr at redhat.com
Thu Aug 17 16:36:25 UTC 2017


On 17.8.2017 00:47, Emilien Macchi wrote:
> Here's an update on the situation.
> 
> On Tue, Aug 15, 2017 at 6:33 PM, Emilien Macchi <emilien at redhat.com> wrote:
>> Problem #1: Upgrade jobs timeout from Newton to Ocata
>> https://bugs.launchpad.net/tripleo/+bug/1702955
> [...]
> 
> - revert distgit patch in RDO: https://review.rdoproject.org/r/8575
> - push https://review.openstack.org/#/c/494334/ as a temporary solution
> - we need https://review.openstack.org/#/c/489874/ landed ASAP.
> - once https://review.openstack.org/#/c/489874/ is landed, we need to
> revert https://review.openstack.org/#/c/494334 ASAP.
> 
> We still need some help to find out why upgrade jobs timeout so much
> in stable/ocata.
> 
>> Problem #2: from Ocata to Pike (containerized) missing container upload step
>> https://bugs.launchpad.net/tripleo/+bug/1710938
>> Wes has a patch (thanks!) that is currently in the gate:
>> https://review.openstack.org/#/c/493972
> [...]
> 
> The patch worked and helped! We've got a successful job running today:
> http://logs.openstack.org/00/461000/32/check/gate-tripleo-ci-centos-7-containers-multinode-upgrades-nv/2f13627/console.html#_2017-08-16_01_31_32_009061
> 
> We're now pushing to the next step: testing the upgrade with pingtest.
> See https://review.openstack.org/#/c/494268/ and the Depends-On: on
> https://review.openstack.org/#/c/461000/.
> 
> If pingtest proves to work, it would be a good news and prove that we
> have a basic workflow in place on which we can iterate.
> 
> The next iterations afterward would be to work on the 4 scenarios that
> are also going to be upgrades from Ocata to pike (001 to 004).
> For that, we'll need Problem #1 and #2 resolved before we want to make
> any progress here, to not hit the same issues that before.
> 
>> Problem #3: from Ocata to Pike: all container images are
>> uploaded/specified, even for services not deployed
>> https://bugs.launchpad.net/tripleo/+bug/1710992
>> The CI jobs are timeouting during the upgrade process because
>> downloading + uploading _all_ containers in local cache takes more
>> than 20 minutes.
>> So this is where we are now, upgrade jobs timeout on that. Steve Baker
>> is currently looking at it but we'll probably offer some help.
> 
> Steve is still working on it: https://review.openstack.org/#/c/448328/
> Steve, if you need any help (reviewing or coding) - please let us
> know, as we consider this thing important to have and probably good to
> have in Pike.

Independent, but related issue is that the job doesn't make use of 
CI-local registry mirrors. I seem to recall we already had mirror usage 
implemented at some point, but we must have lost it somehow. Fix is here:

https://review.openstack.org/#/c/494525/

Jirka

> 
> Thanks,
> 




More information about the OpenStack-dev mailing list