[openstack-dev] [TripleO] Moving tripleo-ci towards the gate

Derek Higgins derekh at redhat.com
Tue Mar 25 14:40:44 UTC 2014


On 24/03/14 22:58, Joe Gordon wrote:
> 
> 
> 
> On Fri, Mar 21, 2014 at 6:29 AM, Derek Higgins <derekh at redhat.com
> <mailto:derekh at redhat.com>> wrote:
> 
>     Hi All,
>        I'm trying to get a handle on what needs to happen before getting
>     tripleo-ci(toci) into the gate, I realize this may take some time but
>     I'm trying to map out how to get to the end goal of putting multi node
>     tripleo based deployments in the gate which should cover a lot of uses
>     cases that devstact-gate doesn't. Here are some of the stages I think we
>     need to achieve before being in the gate along with some questions where
>     people may be able to fill in the blanks.
> 
>     Stage 1: check - tripleo projects
>        This is what we currently have running, 5 separate jobs running non
>     voting checks against tripleo projects
> 
>     Stage 2 (a). reliability
>        Obviously keeping the reliability of both the results and the ci
>     system is a must and we should always aim towards 0% false test results,
>     but is there an acceptable number of false negatives for example that
>     would be acceptable to infa, what are the numbers on the gate at the
>     moment? should we aim to match those at the very least (Maybe we already
>     have). And for how long do we need to maintain those levels before
>     considering the system proven?
> 
> 
> I cannot come up with a specific number for this, perhaps someone else
> can. I see the results and CI system reliability as two very different
> things, for the CI system it should ideally never go down for very long
> (although this is less critical while tripleo is non-voting check only,
> like all other 3rd party systems).  As for false negatives in the
> results, they should be on par with devstack-gate jobs especially once
> you start running tempest.

Yup, that would seem like a reasonable/fair target.

>  
> 
> 
>     Stage 2 (b). speedup
>        How long can the longest jobs take? We have plans in place to speed
>     up our current jobs but what should the target be?
> 
> 
> Gate jobs currently take up to a little over an hour [0][1]
> 
> [0]
> https://jenkins01.openstack.org/job/check-tempest-dsvm-postgres-full/buildTimeTrend
> [1] https://jenkins02.openstack.org/job/check-tempest-dsvm-postgres-full/buildTimeTrend

Our overcloud job is currently just under 90 minutes, I'm confident we
can get below an hour (of course then we have to run tempest and
whatever else we add which will bring us back up)

> 
>  
> 
>     3. More Capacity
> 
> 
> If you wanted to run tripleo-check everwhere a
> ''check-tempest-dsvm-full' job is run that is over 600 jobs in a  24
> hour period.

Looks like I was a little short in my guesstimation and presumably it
wont be 600 this time next year ....

> 
> [3] graphite
> <http://graphite.openstack.org/render/?from=00%3A00_20140203&fgcolor=000000&title=Check%20Hit%20Count&_t=0.2247244759928435&height=308&bgcolor=ffffff&width=586&hideGrid=false&until=23%3A59_20140324&showTarget=color(alias(hitcount(sum(stats.zuul.pipeline.gate.job.gate-tempest-dsvm-neutron-heat-slow.%7BSUCCESS%2CFAILURE%7D)%2C%275hours%27)%2C%20%27gate-tempest-dsvm-neutron-heat-slow%27)%2C%27green%27)&_salt=1395701365.817&lineMode=staircase&target=color(alias(hitcount(sum(stats.zuul.pipeline.check.job.check-tempest-dsvm-full.%7BSUCCESS%2CFAILURE%7D)%2C%2724hours%27)%2C%20%27check-tempest-dsvm-full%20hits%20over%2024%20hours%27)%2C%27orange%27)> color(alias(hitcount(sum(stats.zuul.pipeline.check.job.check-tempest-dsvm-full.{SUCCESS,FAILURE}),'24hours'),
> 'check-tempest-dsvm-full hits over 24 hours'),'orange')
>  
> 
>        I'm going to talk about RAM here as its probably the resource where
>     we will hit our infrastructure limits first.
>        Each time a suite of toci jobs is kicked off we currently kick off 5
>     jobs (which will double once Fedora is added[1])
>        In total these jobs spawn 15 vm's consuming 80G of RAM (its actually
>     120G to workaround a bug we will should soon have fixed[2]), we also
>     have plans that will reduce this 80G further but lets stick with it for
>     the moment.
>        Some of these jobs complete after about 30 minutes but lets say our
>     target is an overall average of 45 minutes.
> 
>        With Fedora that means each run will tie up 160G for 45 minutes. Or
>     160G can provide us with 32 runs (each including 10 jobs) per day
> 
>        So to kick off 500 (I made this number up) runs per day, we would
>     need
>        (500 / 32.0) * 160G = 2500G of RAM
> 
>        We then need to double this number to allow for redundancy, so thats
>     5000G of RAM
> 
>        We probably have about 3/4 of this available to us at the moment but
>     its not evenly balanced between the 2 clouds so we're not covered from a
>     redundancy point of view.
> 
>        So we need more hardware (either by expanding the clouds we have or
>     added new clouds), I'd like for us to start a separate effort to map out
>     exactly what our medium term goals should be, including
>        o jobs we want to run
>        o how long we expect each of them to take
>        o how much ram each one would take
>        so that we can roughly put together an idea of what our HW
>     requirements will be.
> 
>     4. check - all openstack projects
>        Once we're happy we have the required capacity I think we can then
>     move to check on all openstack projects
> 
>     5. voting check - all projects
>        Once we're happy that everybody is happy with reliability I think we
>     can move to voting check
> 
>     6. gate on all openstack projects
>        And then finally when everything else lines up I think we can be
>     added to the gate
> 
>     A) Gating with Ironic
>       I bring this up because there was some confusion about ironic's status
>     in the Gate at a recent tripleo meeting[3], when can tripleo's ironic
>     jobs be part of the gate?
> 
>     Any thoughts? Am I way off with any of my assumptions? Is my maths
>     correct?
> 
>     thanks,
>     Derek.
> 
>     [1] https://review.openstack.org/#/q/status:open+topic:add-f20-jobs,n,z
>     [2] https://bugs.launchpad.net/diskimage-builder/+bug/1289582
>     [3]
>     http://eavesdrop.openstack.org/meetings/tripleo/2014/tripleo.2014-03-11-19.01.log.html
> 
>     _______________________________________________
>     OpenStack-dev mailing list
>     OpenStack-dev at lists.openstack.org
>     <mailto:OpenStack-dev at lists.openstack.org>
>     http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> 
> 
> 
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 




More information about the OpenStack-dev mailing list