[openstack-dev] [qa] Smarter timeouts in Tempest?

Matt Riedemann mriedem at linux.vnet.ibm.com
Mon May 19 18:13:03 UTC 2014



On 5/19/2014 11:33 AM, Matt Riedemann wrote:
>
>
> On 5/19/2014 10:53 AM, Matt Riedemann wrote:
>> I was looking through this timeout bug [1] this morning and am able to
>> correlate that around the time of the image snapshot timeout, ceilometer
>> was really hammering CPU on the host.  There are already threads on
>> ceilometer performance and how that needs to be improved for Tempest
>> runs so I don't want to get into that here.
>>
>> What I'm thinking about is if there is a way to be smarter about how we
>> do timeouts in the tests, rather than just rely on globally configured
>> hard-coded timeouts which are bound to fail intermittently in dynamic
>> environments like this.
>>
>> I'm thinking something along the lines of keeping track of CPU stats on
>> intervals in our waiter loops, then when we reach our configured
>> timeout, calculate the average CPU load/idle and if it falls below some
>> threshold, we cut the timeout in half and redo the timeout loop - and we
>> continue that until our timeout reaches some level that no longer makes
>> sense, like once it drops less than a minute for example.
>>
>> Are there other ideas here?  My main concern is the number of random
>> timeout failures we see in the tests and then people are trying to
>> fingerprint them with elastic-recheck but the queries are so generic
>> they are not really useful.  We now put the test class and test case in
>> the compute test timeout messages, but it's also not very useful to
>> fingerprint every individual permutation of test class/case that we can
>> hit a timeout in.
>>
>> [1] https://bugs.launchpad.net/nova/+bug/1320617
>>
>
> This change to devstack should help [1].
>
> It would be good if we actually used the default timeouts we have
> configured in Tempest rather than hard-coding them in devstack based on
> the latest state of the gate at the time.
>
> [1] https://review.openstack.org/#/c/94221/
>

I have a proof of concept up for Tempest with adjusted timeouts based on 
CPU idle values here:

https://review.openstack.org/#/c/94245/

-- 

Thanks,

Matt Riedemann




More information about the OpenStack-dev mailing list