[openstack-dev] Unwedging the gate

Sean Dague sean at dague.net
Mon Nov 25 19:43:52 UTC 2013


On 11/25/2013 11:54 AM, Clint Byrum wrote:
> Excerpts from Robert Collins's message of 2013-11-25 01:30:11 -0800:
>> On 25 November 2013 22:23, Clint Byrum <clint at fewbar.com> wrote:
>>
>>> I do wonder if we would be able to commit enough resources to just run
>>> two copies of the gate in parallel each time and require both to pass.
>>> Doubling the odds* that we will catch an intermittent failure seems like
>>> something that might be worth doubling the compute resources used by
>>> the gate.
>>>
>>> *I suck at math. Probably isn't doubling the odds. Sounds
>>> good though. ;)
>>
>> We already run the code paths that were breaking 8 or more times.
>> Hundreds of times in fact for some :(.
>>
>> The odds of a broken path triggering after it gets through, assuming
>> each time we exercise it is equally likely to show it, are roughly
>> 3/times-exercised-in-landing. E.g. if we run a code path 300 times and
>> it doesn't show up, then it's quite possible that it has a 1%
>> incidence rate.
> 
> We don't run through 300 times of the same circumstances. We may pass
> through indidivual code paths that have a race condition 300 times, but
> the circumstances are probably only right for failure in 1 or 2 of them.
> 
> 1% overall then, doesn't matter so much as how often does it fail when
> the conditions for failure are optimal. If we can increase the ocurrences
> of the most likely failure conditions, then we do have a better chance
> of catching the failure.

Right, the math of statistics is against us in brute forcing 1% fails to
be blocked in the gate:

Even running the whole test suite 20 times, means those will pass
through 80% of the time unseen:

	0.99 ^ 20 = 0.817 - remember we need to do the exponent on the success
rate, as what we are actually trying to figure out is odds that this
will succeed 20 times

So what we actually need to do is use the 1% fails as realizing some
kind of race exists at all, then actively try to create a scenario where
that kind of race happens *a lot* to nail it down, and ensure it never
comes back.

	-Sean

-- 
Sean Dague
http://dague.net

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 482 bytes
Desc: OpenPGP digital signature
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20131125/43c0754e/attachment.pgp>


More information about the OpenStack-dev mailing list