Open Stack

Mon May 19 16:33:50 UTC 2014

On 5/19/2014 10:53 AM, Matt Riedemann wrote:
> I was looking through this timeout bug [1] this morning and am able to
> correlate that around the time of the image snapshot timeout, ceilometer
> was really hammering CPU on the host.  There are already threads on
> ceilometer performance and how that needs to be improved for Tempest
> runs so I don't want to get into that here.
>
> What I'm thinking about is if there is a way to be smarter about how we
> do timeouts in the tests, rather than just rely on globally configured
> hard-coded timeouts which are bound to fail intermittently in dynamic
> environments like this.
>
> I'm thinking something along the lines of keeping track of CPU stats on
> intervals in our waiter loops, then when we reach our configured
> timeout, calculate the average CPU load/idle and if it falls below some
> threshold, we cut the timeout in half and redo the timeout loop - and we
> continue that until our timeout reaches some level that no longer makes
> sense, like once it drops less than a minute for example.
>
> Are there other ideas here?  My main concern is the number of random
> timeout failures we see in the tests and then people are trying to
> fingerprint them with elastic-recheck but the queries are so generic
> they are not really useful.  We now put the test class and test case in
> the compute test timeout messages, but it's also not very useful to
> fingerprint every individual permutation of test class/case that we can
> hit a timeout in.
>
> [1] https://bugs.launchpad.net/nova/+bug/1320617
>

This change to devstack should help [1].

It would be good if we actually used the default timeouts we have 
configured in Tempest rather than hard-coding them in devstack based on 
the latest state of the gate at the time.

[1] https://review.openstack.org/#/c/94221/

-- 

Thanks,

Matt Riedemann

Open Stack

[openstack-dev] [qa] Smarter timeouts in Tempest?

OpenStack

Community

Documentation

Branding & Legal