[openstack-qa] tempest run length - need a gate tag - call for help
Robert Collins
robertc at robertcollins.net
Fri May 17 15:58:25 UTC 2013
On 15 May 2013 04:18, Monty Taylor <mordred at inaugust.com> wrote:
>
>
>>> We have a number of other things that we can do to reduce run-time that
>>> I think we agreed should be a higher priority:
>>>
>>> A) Parallelize the test runner (move to testr).
>>> B) Split the run into multiple jobs (XML vs JSON, etc).
>>> C) Focus on flakey tests so that gate resets are less of a factor
>>> (reducing sensitivity to runtime).
>>>
>>> Note that work on both A and B independently facilitates C.
>>>
>>> I think the general direction we'd like to head is to run _more_ tests,
>>> not less. Further, I don't think that check jobs and gate jobs should
>>> run different tests -- some people will learn to just ignore check jobs
>>> and enqueue failing jobs into the gate (as people already ignore
>>> non-voting jobs), resulting in more bad code landing. It's also
>>> optimizing the wrong pipeline -- developers are more sensitive to slow
>>> check jobs than gate jobs.
>>>
>>> I got the impression that we all agreed that testr was the highest
>>> priority for this, and I'd still like to see that land before we move on
>>> to functional job splits. Is that effort progressing? What can we do
>>> to help?
>>
>> A is currently stalled for lack of anyone working on it for H1. Chris
>> Yeoh was the driving force in Grizzly on this, but he's hard at work on
>> the Nova v3 API right now. Matt Treinish's going to pick this up on H2
>> if no one else steps forward.
>
> Robert - can you provide some help here? It seems like we're blocking on
> some engineering challenges that I believe might be easy for you.
Sorry for my tardy response, I blame Monty : all meetings all the time
makes Robert a dull boy :)
I think you translated a verbal braindump from me mid-week, but just
to ensure, here is my understanding of tempest parallel test status:
* [some] tests cannot run independently
* This forces a complex scheduler [compared to the current testr
scheduler of 'equal time'
* Jay Pipes has a patch that moves to non-optimised testresources,
which is good prep work but at least short term a performance
pessimisation.
Here is how I'd tackle the problem:
* Determine what performance we need, and whether we
* Run all the tests individually to identify broken tests and fix them.
* Start running tests in parallel and determine the performance *in
the gate environment*.
* Resize the gate environment to a large/huge instance and determine
the scaling factor we get + whether we hit the performance target.
* If we do, we're done. Move along.
* If we don't:
- analyze where the time goes in a little more detail: is it the
system under test that's slow or the repeated effort...
- ditto for why the scaling factor we get is $whateveritis.
- If the scaling factor is tolerable, moving to multi-machine
parallelisation, with machine count determined by the perf goal +
scaling factor - that would be the least investment strategy.
- if the scaling factor is poor we need to fix that.
To fix the scaling factor, there are two branches:
- we might hit contention on the test cloud we drive.
- or it might be repeated overhead of case setup in the parallel
environment : for that we need to either schedule differently, or
allow test runners to make better reuse of fixtures across wider
contexts than class/module.
- so we could at that point do a custom testr scheduler
- or we can at this point bring in testresources optimisation +
ensure that shared state gets represented via declared resources
rather than code hierarchy.
tl;dr: data is king, and the very first axiom for test frameworks is
that tests are idempotent independent isolated things. Which tempest
breaks.
-Rob
--
Robert Collins <rbtcollins at hp.com>
Distinguished Technologist
HP Cloud Services
More information about the openstack-qa
mailing list