[openstack-dev] [TripleO][infra][CI] Moving OVB jobs from RH1 cloud to RDO cloud, plan

David Moreau Simard dms at redhat.com
Wed Oct 25 21:00:15 UTC 2017

We're currently running with a max-servers of 80 for the TripleO tenant.
This number doesn't include OVB nodes.

When taking into account OVB nodes, we are already nearing vCPU capacity
and could consider raising the overcommit ratio from 2.0 to 4.0 to make use
of the available RAM.

See the rough maths in my comment here [1].

[1]: https://review.rdoproject.org/r/#/c/10249/1/nodepool/nodepool.yaml@133

David Moreau Simard
Senior Software Engineer | Openstack RDO

dmsimard = [irc, github, twitter]

On Oct 25, 2017 1:39 PM, "Ben Nemec" <openstack at nemebean.com> wrote:

> Overall sounds good.  A couple of comments inline.
> On 10/23/2017 05:46 AM, Sagi Shnaidman wrote:
>> Hi,
>> as you know we prepare transition of all OVB jobs from RH1 cloud to RDO
>> cloud, also a few long multinode upgrades jobs as well. We prepared a
>> workflow of transition below, please feel free to comment.
>> 1) We run one job (ovb-ha-oooq) on every patch in following repos: oooq,
>> oooq-extras, tripleo-ci. We run rest of ovb jobs (containers and fs024) as
>> experimental in rdo cloud for following repos: oooq, oooq-extras,
>> tripleo-ci, tht, tripleo-common. It should cover most of our testing. This
>> step is completed.
>> Currently it's blocked by newton bug in RDO cloud:
>> https://bugs.launchpad.net/heat/+bug/1626256 , where cloud release
>> doesn't contain its fix: https://review.openstack.org/#/c/501592/ . From
>> other side, the upgrade to Ocata release (which would solve this issue too)
>> is blocked by bug: https://bugs.launchpad.net/tripleo/+bug/1724328
>> So we are in blocked state right now with moving.
>> Next steps:
>> 2) We solve all issues with running on every patch job (ovb-ha-oooq) so
>> that it's passing (or failing exactly for same results as on rh1) for a 2
>> regular working days. (not weekend).
>> 3) We should trigger experimental jobs in this time on various patches in
>> tht and tripleo-common and solve all issues for experimental jobs so all
>> ovb jobs pass.
>> 4) We need to monitor all this time resources in openstack-nodepool
>> tenant (with help of rhops maybe) and be sure that it has the capacity to
>> run configured jobs.
> I assume we will have a max jobs limit in nodepool (or whatever we're
> using for that purpose) that will ensure we stay within capacity regardless
> of what jobs are configured.  We probably want to keep that limit low
> initially so we don't have to worry about throwing a huge number of jobs at
> the cloud accidentally (say someone submits a large patch series that
> triggers our subset of jobs).
> Obviously as we add jobs we'll need to bump the concurrent jobs limit, but
> I think that should be the primary variable we change and that we add more
> jobs as necessary to fill the configured limit.  Also, rather than set a
> time period of two days or whatever, ensure we run at the configured limit
> for some period of time before increasing it.  There are slow days in ci
> where we might not get much useful information so we need to make sure we
> don't get a false positive result from a step just because of the quirks of
> ci load.
> 5) We set ovb-ha-oooq job as running for every patch in all places where
>> it's running in rh1 (in parallel with existing rh1 job). We monitor RDO
>> cloud that it doesn't fail and still have resources - 1.5 working days
>> 6) We add featureset024 ovb job to run in every patch where it runs in
>> rh1. We continue to monitor RDO cloud - 1.5 working days
>> 7) We add last containers ovb job to all patches where it runs on rh1. We
>> continue monitor RDO cloud - 2 days.
>> 8) In case if everything is OK in all previous points and RDO cloud still
>> performs well, we remove ovb jobs from rh1 configuration and make them as
>> experimental.
>> 9) During next few days we monitor ovb jobs and run rh1 ovb jobs as
>> experimental to check if we have the same results (or better :) )
>> 10) OVB jobs on rh1 cloud stay in experimental pipeline in tripleo for a
>> next month or two.
>> --
>> Best regards
>> Sagi Shnaidman
>> ____________________________________________________________
>> ______________
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscrib
>> e
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20171025/ebae0450/attachment.html>

More information about the OpenStack-dev mailing list