[openstack-dev] [tripleo] critical situation with CI / upgrade jobs
Bogdan Dobrelya
bdobreli at redhat.com
Wed Aug 16 10:17:22 UTC 2017
On 16.08.2017 5:06, Wesley Hayutin wrote:
>
>
> On Tue, Aug 15, 2017 at 9:33 PM, Emilien Macchi <emilien at redhat.com
> <mailto:emilien at redhat.com>> wrote:
>
> So far, we're having 3 critical issues, that we all need to address as
> soon as we can.
>
> Problem #1: Upgrade jobs timeout from Newton to Ocata
> https://bugs.launchpad.net/tripleo/+bug/1702955
> <https://bugs.launchpad.net/tripleo/+bug/1702955>
> Today I spent an hour to look at it and here's what I've found so far:
> depending on which public cloud we're running the TripleO CI jobs, it
> timeouts or not.
> Here's an example of Heat resources that run in our CI:
> https://www.diffchecker.com/VTXkNFuk
> <https://www.diffchecker.com/VTXkNFuk>
> On the left, resources on a job that failed (running on internap) and
> on the right (running on citycloud) it worked.
> I've been through all upgrade steps and I haven't seen specific tasks
> that take more time here or here, but some little changes that make
> the big change at the end (so hard to debug).
> Note: both jobs use AFS mirrors.
> Help on that front would be very welcome.
>
>
> Problem #2: from Ocata to Pike (containerized) missing container
> upload step
> https://bugs.launchpad.net/tripleo/+bug/1710938
> <https://bugs.launchpad.net/tripleo/+bug/1710938>
> Wes has a patch (thanks!) that is currently in the gate:
> https://review.openstack.org/#/c/493972
> <https://review.openstack.org/#/c/493972>
> Thanks to that work, we managed to find the problem #3.
>
>
> Problem #3: from Ocata to Pike: all container images are
> uploaded/specified, even for services not deployed
> https://bugs.launchpad.net/tripleo/+bug/1710992
> <https://bugs.launchpad.net/tripleo/+bug/1710992>
> The CI jobs are timeouting during the upgrade process because
> downloading + uploading _all_ containers in local cache takes more
> than 20 minutes.
> So this is where we are now, upgrade jobs timeout on that. Steve Baker
> is currently looking at it but we'll probably offer some help.
>
>
> Solutions:
> - for stable/ocata: make upgrade jobs non-voting
> - for pike: keep upgrade jobs non-voting and release without upgrade
> testing
>
> Risks:
> - for stable/ocata: it's highly possible to inject regression if jobs
> aren't voting anymore.
> - for pike: the quality of the release won't be good enough in term of
> CI coverage comparing to Ocata.
>
> Mitigations:
> - for stable/ocata: make jobs non-voting and enforce our
> core-reviewers to pay double attention on what is landed. It should be
> temporary until we manage to fix the CI jobs.
> - for master: release RC1 without upgrade jobs and make progress
> - Run TripleO upgrade scenarios as third party CI in RDO Cloud or
> somewhere with resources and without timeout constraints.
>
> I would like some feedback on the proposal so we can move forward
> this week,
> Thanks.
> --
> Emilien Macchi
>
>
> I think due to some of the limitations with run times upstream we may
> need to rethink the workflow with upgrade tests upstream. It's not very
> clear to me what can be done with the multinode nodepool jobs outside of
> what is already being done. I think we do have some choices with ovb
We could limit the upstream multinode jobs scope to only do upgrade
testing of a couple of the services deployed, like keystone and nova and
neutron, or so.
> jobs. I'm not going to try and solve in this email but rethinking how
> we CI upgrades in the upstream infrastructure should be a focus for the
> Queens PTG. We will need to focus on bringing run times significantly
> down as it's incredibly difficult to run two installs in 175 minutes
> across all the upstream cloud providers.
>
> Thanks Emilien for all the work you have done around upgrades!
>
>
>
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe:
> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> <http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe>
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> <http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev>
>
>
>
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
--
Best regards,
Bogdan Dobrelya,
Irc #bogdando
More information about the OpenStack-dev
mailing list