[openstack-qa] tempest run length - need a gate tag - call for help

Attila Fazekas afazekas at redhat.com
Sun May 19 17:54:10 UTC 2013


Hi Robert,

We have a ~9 minute cost, shared, not idempotent fixture,
 which behaves even if tempest not running. 
This fixture has multiple components composed from millions of code line
 which written in various languages and can behave differently regardless
 how idempotent our framework.
Evan the elapsed time and number of received hardware interrupts 
 changes the behavior.
Every operation has a side effect which persist to the future.

These things visibility depends on the verification type and style.

Data is king.

With this kind of 'fixture' we can have this kind of questions:

Which test case run previously, before the first failing ones ?
Which test case run at the same time ?
What was the exact previous steps done by the test cases ?
 (Test cases processes data, came form OpenStack, not necessary 100% reproducible)

Now, I can see the time when the test case ran and previous steps before the failure.
I can compare these things to the system log. (Hopefully later with another tracers output)

Looks like the nova has a custom solution for STDOUT, STDERR capturing, but I did not check
 the details yet.

What is the recommended tempest code change and/or testr/testtools setting option, 
 which can help us to answer these questions ?

Do we need to monkey-patch the sys.stderr, sys.stdout streams ?
Do we need to configure a logger ?

What is recommended setting for a multiple same(similar) OpenStack deployment environment ?

Best Regards,
Attila

PS.:
Based on the current performance data and expectations. I am OK with sacrifice little from single
 thread performance, in-order to have a horizontally and vertically scale-able solution 
 and for the "other reasons".

PS.:
When the gate log contained just the print time instead of the happening time
 I had this kind of guess about a random failure:
https://bugs.launchpad.net/tempest/+bug/1117555/comments/11
For various reasons the test case was did not run for longer time, so I had no idea 
which was the related change.
The issue wasn't 100% percent reproducible, reproduce ability looked like it is
 in connection with the system performance.
I run the test case in a loop in-order to catch it.

We have a very complex system(fixture) and minimal trace.
Anything can be broken on a random failure.
A change in one component can trigger a bug in an another.

----- Original Message -----
From: "Robert Collins" <robertc at robertcollins.net>
To: "Attila Fazekas" <afazekas at redhat.com>
Cc: "All Things QA." <openstack-qa at lists.openstack.org>, "Sean Dague" <sean at dague.net>
Sent: Friday, May 17, 2013 6:04:02 PM
Subject: Re: [openstack-qa] tempest run length - need a gate tag - call for help

On 15 May 2013 18:50, Attila Fazekas <afazekas at redhat.com> wrote:
> In Jenkins you can a single console and you can see your
>  change is not OK even before jenkins says -1.
>
> Mixing two nose processes output is not readable, without an additional trick.
>
> The real questions:
> Will we have this feature with testr ?

What feature? Correct multiplexing of output from multiple test
outcomes? Thats not a feature, it's a requirement. It's *why* testr
exists.

-Rob

-- 
Robert Collins <rbtcollins at hp.com>
Distinguished Technologist
HP Cloud Services



More information about the openstack-qa mailing list