[openstack-dev] [tripleo] critical situation with CI / upgrade jobs

Bogdan Dobrelya bdobreli at redhat.com
Wed Aug 16 10:22:07 UTC 2017


On 16.08.2017 3:33, Emilien Macchi wrote:
> So far, we're having 3 critical issues, that we all need to address as
> soon as we can.
> 
> Problem #1: Upgrade jobs timeout from Newton to Ocata
> https://bugs.launchpad.net/tripleo/+bug/1702955
> Today I spent an hour to look at it and here's what I've found so far:
> depending on which public cloud we're running the TripleO CI jobs, it
> timeouts or not.
> Here's an example of Heat resources that run in our CI:
> https://www.diffchecker.com/VTXkNFuk
> On the left, resources on a job that failed (running on internap) and
> on the right (running on citycloud) it worked.
> I've been through all upgrade steps and I haven't seen specific tasks
> that take more time here or here, but some little changes that make
> the big change at the end (so hard to debug).
> Note: both jobs use AFS mirrors.
> Help on that front would be very welcome.
> 
> 
> Problem #2: from Ocata to Pike (containerized) missing container upload step
> https://bugs.launchpad.net/tripleo/+bug/1710938
> Wes has a patch (thanks!) that is currently in the gate:
> https://review.openstack.org/#/c/493972
> Thanks to that work, we managed to find the problem #3.
> 
> 
> Problem #3: from Ocata to Pike: all container images are
> uploaded/specified, even for services not deployed
> https://bugs.launchpad.net/tripleo/+bug/1710992
> The CI jobs are timeouting during the upgrade process because
> downloading + uploading _all_ containers in local cache takes more
> than 20 minutes.
> So this is where we are now, upgrade jobs timeout on that. Steve Baker
> is currently looking at it but we'll probably offer some help.
> 
> 
> Solutions:
> - for stable/ocata: make upgrade jobs non-voting
> - for pike: keep upgrade jobs non-voting and release without upgrade testing

This doesn't look like a viable option to me. I'd prefer reduce the
scope (deployed services under upgrade testing) of the upgrade testing,
but release only having it passing for that scope.

> 
> Risks:
> - for stable/ocata: it's highly possible to inject regression if jobs
> aren't voting anymore.
> - for pike: the quality of the release won't be good enough in term of
> CI coverage comparing to Ocata.
> 
> Mitigations:
> - for stable/ocata: make jobs non-voting and enforce our
> core-reviewers to pay double attention on what is landed. It should be
> temporary until we manage to fix the CI jobs.
> - for master: release RC1 without upgrade jobs and make progress
> - Run TripleO upgrade scenarios as third party CI in RDO Cloud or
> somewhere with resources and without timeout constraints.
> 
> I would like some feedback on the proposal so we can move forward this week,
> Thanks.
> 


-- 
Best regards,
Bogdan Dobrelya,
Irc #bogdando



More information about the OpenStack-dev mailing list