[openstack-dev] [TripleO][infra][CI] Moving OVB jobs from RH1 cloud to RDO cloud, plan

Ben Nemec openstack at nemebean.com
Wed Oct 25 17:38:12 UTC 2017


Overall sounds good.  A couple of comments inline.

On 10/23/2017 05:46 AM, Sagi Shnaidman wrote:
> Hi,
> 
> as you know we prepare transition of all OVB jobs from RH1 cloud to RDO 
> cloud, also a few long multinode upgrades jobs as well. We prepared a 
> workflow of transition below, please feel free to comment.
> 
> 
> 1) We run one job (ovb-ha-oooq) on every patch in following repos: oooq, 
> oooq-extras, tripleo-ci. We run rest of ovb jobs (containers and fs024) 
> as experimental in rdo cloud for following repos: oooq, oooq-extras, 
> tripleo-ci, tht, tripleo-common. It should cover most of our testing. 
> This step is completed.
> 
> Currently it's blocked by newton bug in RDO cloud: 
> https://bugs.launchpad.net/heat/+bug/1626256 , where cloud release 
> doesn't contain its fix: https://review.openstack.org/#/c/501592/ . From 
> other side, the upgrade to Ocata release (which would solve this issue 
> too) is blocked by bug: https://bugs.launchpad.net/tripleo/+bug/1724328
> So we are in blocked state right now with moving.
> 
> Next steps:
> 
> 2) We solve all issues with running on every patch job (ovb-ha-oooq) so 
> that it's passing (or failing exactly for same results as on rh1) for a 
> 2 regular working days. (not weekend).
> 3) We should trigger experimental jobs in this time on various patches 
> in tht and tripleo-common and solve all issues for experimental jobs so 
> all ovb jobs pass.
> 4) We need to monitor all this time resources in openstack-nodepool 
> tenant (with help of rhops maybe) and be sure that it has the capacity 
> to run configured jobs.

I assume we will have a max jobs limit in nodepool (or whatever we're 
using for that purpose) that will ensure we stay within capacity 
regardless of what jobs are configured.  We probably want to keep that 
limit low initially so we don't have to worry about throwing a huge 
number of jobs at the cloud accidentally (say someone submits a large 
patch series that triggers our subset of jobs).

Obviously as we add jobs we'll need to bump the concurrent jobs limit, 
but I think that should be the primary variable we change and that we 
add more jobs as necessary to fill the configured limit.  Also, rather 
than set a time period of two days or whatever, ensure we run at the 
configured limit for some period of time before increasing it.  There 
are slow days in ci where we might not get much useful information so we 
need to make sure we don't get a false positive result from a step just 
because of the quirks of ci load.

> 5) We set ovb-ha-oooq job as running for every patch in all places where 
> it's running in rh1 (in parallel with existing rh1 job). We monitor RDO 
> cloud that it doesn't fail and still have resources - 1.5 working days
> 6) We add featureset024 ovb job to run in every patch where it runs in 
> rh1. We continue to monitor RDO cloud - 1.5 working days
> 7) We add last containers ovb job to all patches where it runs on rh1. 
> We continue monitor RDO cloud - 2 days.
> 8) In case if everything is OK in all previous points and RDO cloud 
> still performs well, we remove ovb jobs from rh1 configuration and make 
> them as experimental.
> 9) During next few days we monitor ovb jobs and run rh1 ovb jobs as 
> experimental to check if we have the same results (or better :) )
> 10) OVB jobs on rh1 cloud stay in experimental pipeline in tripleo for a 
> next month or two.
> 
> -- 
> Best regards
> Sagi Shnaidman
> 
> 
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 



More information about the OpenStack-dev mailing list