[openstack-dev] [tripleo] critical situation with CI / upgrade jobs

Bogdan Dobrelya bdobreli at redhat.com
Wed Aug 16 10:17:22 UTC 2017


On 16.08.2017 5:06, Wesley Hayutin wrote:
> 
> 
> On Tue, Aug 15, 2017 at 9:33 PM, Emilien Macchi <emilien at redhat.com
> <mailto:emilien at redhat.com>> wrote:
> 
>     So far, we're having 3 critical issues, that we all need to address as
>     soon as we can.
> 
>     Problem #1: Upgrade jobs timeout from Newton to Ocata
>     https://bugs.launchpad.net/tripleo/+bug/1702955
>     <https://bugs.launchpad.net/tripleo/+bug/1702955>
>     Today I spent an hour to look at it and here's what I've found so far:
>     depending on which public cloud we're running the TripleO CI jobs, it
>     timeouts or not.
>     Here's an example of Heat resources that run in our CI:
>     https://www.diffchecker.com/VTXkNFuk
>     <https://www.diffchecker.com/VTXkNFuk>
>     On the left, resources on a job that failed (running on internap) and
>     on the right (running on citycloud) it worked.
>     I've been through all upgrade steps and I haven't seen specific tasks
>     that take more time here or here, but some little changes that make
>     the big change at the end (so hard to debug).
>     Note: both jobs use AFS mirrors.
>     Help on that front would be very welcome.
> 
> 
>     Problem #2: from Ocata to Pike (containerized) missing container
>     upload step
>     https://bugs.launchpad.net/tripleo/+bug/1710938
>     <https://bugs.launchpad.net/tripleo/+bug/1710938>
>     Wes has a patch (thanks!) that is currently in the gate:
>     https://review.openstack.org/#/c/493972
>     <https://review.openstack.org/#/c/493972>
>     Thanks to that work, we managed to find the problem #3.
> 
> 
>     Problem #3: from Ocata to Pike: all container images are
>     uploaded/specified, even for services not deployed
>     https://bugs.launchpad.net/tripleo/+bug/1710992
>     <https://bugs.launchpad.net/tripleo/+bug/1710992>
>     The CI jobs are timeouting during the upgrade process because
>     downloading + uploading _all_ containers in local cache takes more
>     than 20 minutes.
>     So this is where we are now, upgrade jobs timeout on that. Steve Baker
>     is currently looking at it but we'll probably offer some help.
> 
> 
>     Solutions:
>     - for stable/ocata: make upgrade jobs non-voting
>     - for pike: keep upgrade jobs non-voting and release without upgrade
>     testing
> 
>     Risks:
>     - for stable/ocata: it's highly possible to inject regression if jobs
>     aren't voting anymore.
>     - for pike: the quality of the release won't be good enough in term of
>     CI coverage comparing to Ocata.
> 
>     Mitigations:
>     - for stable/ocata: make jobs non-voting and enforce our
>     core-reviewers to pay double attention on what is landed. It should be
>     temporary until we manage to fix the CI jobs.
>     - for master: release RC1 without upgrade jobs and make progress
>     - Run TripleO upgrade scenarios as third party CI in RDO Cloud or
>     somewhere with resources and without timeout constraints.
> 
>     I would like some feedback on the proposal so we can move forward
>     this week,
>     Thanks.
>     --
>     Emilien Macchi
> 
> 
> I think due to some of the limitations with run times upstream we may
> need to rethink the workflow with upgrade tests upstream. It's not very
> clear to me what can be done with the multinode nodepool jobs outside of
> what is already being done.  I think we do have some choices with ovb

We could limit the upstream multinode jobs scope to only do upgrade
testing of a couple of the services deployed, like keystone and nova and
neutron, or so.

> jobs.   I'm not going to try and solve in this email but rethinking how
> we CI upgrades in the upstream infrastructure should be a focus for the
> Queens PTG.  We will need to focus on bringing run times significantly
> down as it's incredibly difficult to run two installs in 175 minutes
> across all the upstream cloud providers.
> 
> Thanks Emilien for all the work you have done around upgrades!
> 
>  
> 
> 
>     __________________________________________________________________________
>     OpenStack Development Mailing List (not for usage questions)
>     Unsubscribe:
>     OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>     <http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe>
>     http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>     <http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev>
> 
> 
> 
> 
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 


-- 
Best regards,
Bogdan Dobrelya,
Irc #bogdando



More information about the OpenStack-dev mailing list