[openstack-dev] [tripleo] critical situation with CI / upgrade jobs

Emilien Macchi emilien at redhat.com
Wed Aug 16 22:47:34 UTC 2017


Here's an update on the situation.

On Tue, Aug 15, 2017 at 6:33 PM, Emilien Macchi <emilien at redhat.com> wrote:
> Problem #1: Upgrade jobs timeout from Newton to Ocata
> https://bugs.launchpad.net/tripleo/+bug/1702955
[...]

- revert distgit patch in RDO: https://review.rdoproject.org/r/8575
- push https://review.openstack.org/#/c/494334/ as a temporary solution
- we need https://review.openstack.org/#/c/489874/ landed ASAP.
- once https://review.openstack.org/#/c/489874/ is landed, we need to
revert https://review.openstack.org/#/c/494334 ASAP.

We still need some help to find out why upgrade jobs timeout so much
in stable/ocata.

> Problem #2: from Ocata to Pike (containerized) missing container upload step
> https://bugs.launchpad.net/tripleo/+bug/1710938
> Wes has a patch (thanks!) that is currently in the gate:
> https://review.openstack.org/#/c/493972
[...]

The patch worked and helped! We've got a successful job running today:
http://logs.openstack.org/00/461000/32/check/gate-tripleo-ci-centos-7-containers-multinode-upgrades-nv/2f13627/console.html#_2017-08-16_01_31_32_009061

We're now pushing to the next step: testing the upgrade with pingtest.
See https://review.openstack.org/#/c/494268/ and the Depends-On: on
https://review.openstack.org/#/c/461000/.

If pingtest proves to work, it would be a good news and prove that we
have a basic workflow in place on which we can iterate.

The next iterations afterward would be to work on the 4 scenarios that
are also going to be upgrades from Ocata to pike (001 to 004).
For that, we'll need Problem #1 and #2 resolved before we want to make
any progress here, to not hit the same issues that before.

> Problem #3: from Ocata to Pike: all container images are
> uploaded/specified, even for services not deployed
> https://bugs.launchpad.net/tripleo/+bug/1710992
> The CI jobs are timeouting during the upgrade process because
> downloading + uploading _all_ containers in local cache takes more
> than 20 minutes.
> So this is where we are now, upgrade jobs timeout on that. Steve Baker
> is currently looking at it but we'll probably offer some help.

Steve is still working on it: https://review.openstack.org/#/c/448328/
Steve, if you need any help (reviewing or coding) - please let us
know, as we consider this thing important to have and probably good to
have in Pike.

Thanks,
-- 
Emilien Macchi



More information about the OpenStack-dev mailing list