[openstack-dev] [TripleO] Moving tripleo-ci towards the gate
derekh at redhat.com
Fri Mar 21 13:29:26 UTC 2014
I'm trying to get a handle on what needs to happen before getting
tripleo-ci(toci) into the gate, I realize this may take some time but
I'm trying to map out how to get to the end goal of putting multi node
tripleo based deployments in the gate which should cover a lot of uses
cases that devstact-gate doesn't. Here are some of the stages I think we
need to achieve before being in the gate along with some questions where
people may be able to fill in the blanks.
Stage 1: check - tripleo projects
This is what we currently have running, 5 separate jobs running non
voting checks against tripleo projects
Stage 2 (a). reliability
Obviously keeping the reliability of both the results and the ci
system is a must and we should always aim towards 0% false test results,
but is there an acceptable number of false negatives for example that
would be acceptable to infa, what are the numbers on the gate at the
moment? should we aim to match those at the very least (Maybe we already
have). And for how long do we need to maintain those levels before
considering the system proven?
Stage 2 (b). speedup
How long can the longest jobs take? We have plans in place to speed
up our current jobs but what should the target be?
3. More Capacity
I'm going to talk about RAM here as its probably the resource where
we will hit our infrastructure limits first.
Each time a suite of toci jobs is kicked off we currently kick off 5
jobs (which will double once Fedora is added)
In total these jobs spawn 15 vm's consuming 80G of RAM (its actually
120G to workaround a bug we will should soon have fixed), we also
have plans that will reduce this 80G further but lets stick with it for
Some of these jobs complete after about 30 minutes but lets say our
target is an overall average of 45 minutes.
With Fedora that means each run will tie up 160G for 45 minutes. Or
160G can provide us with 32 runs (each including 10 jobs) per day
So to kick off 500 (I made this number up) runs per day, we would need
(500 / 32.0) * 160G = 2500G of RAM
We then need to double this number to allow for redundancy, so thats
5000G of RAM
We probably have about 3/4 of this available to us at the moment but
its not evenly balanced between the 2 clouds so we're not covered from a
redundancy point of view.
So we need more hardware (either by expanding the clouds we have or
added new clouds), I'd like for us to start a separate effort to map out
exactly what our medium term goals should be, including
o jobs we want to run
o how long we expect each of them to take
o how much ram each one would take
so that we can roughly put together an idea of what our HW
requirements will be.
4. check - all openstack projects
Once we're happy we have the required capacity I think we can then
move to check on all openstack projects
5. voting check - all projects
Once we're happy that everybody is happy with reliability I think we
can move to voting check
6. gate on all openstack projects
And then finally when everything else lines up I think we can be
added to the gate
A) Gating with Ironic
I bring this up because there was some confusion about ironic's status
in the Gate at a recent tripleo meeting, when can tripleo's ironic
jobs be part of the gate?
Any thoughts? Am I way off with any of my assumptions? Is my maths correct?
More information about the OpenStack-dev