[openstack-dev] [nova] Bogus -1 scores from turbo hipster
Samuel Merritt
sam at swiftstack.com
Wed Jan 8 18:05:48 UTC 2014
On 1/7/14 2:53 PM, Michael Still wrote:
> Hi. Thanks for reaching out about this.
>
> It seems this patch has now passed turbo hipster, so I am going to
> treat this as a more theoretical question than perhaps you intended. I
> should note though that Joshua Hesketh and I have been trying to read
> / triage every turbo hipster failure, but that has been hard this week
> because we're both at a conference.
>
> The problem this patch faced is that we are having trouble defining
> what is a reasonable amount of time for a database migration to run
> for. Specifically:
>
> 2014-01-07 14:59:32,012 [output] 205 -> 206...
> 2014-01-07 14:59:32,848 [heartbeat]
> 2014-01-07 15:00:02,848 [heartbeat]
> 2014-01-07 15:00:32,849 [heartbeat]
> 2014-01-07 15:00:39,197 [output] done
>
> So applying migration 206 took slightly over a minute (67 seconds).
> Our historical data (mean + 2 standard deviations) says that this
> migration should take no more than 63 seconds. So this only just
> failed the test.
It seems to me that requiring a runtime less than (mean + 2 stddev)
leads to a false-positive rate of 1 in 40, right? If the runtimes have a
normal(-ish) distribution, then 95% of them will be within 2 standard
deviations of the mean, so that's 1 in 20 falling outside that range.
Then discard the ones that are faster than (mean - 2 stddev), and that
leaves 1 in 40. Please correct me if I'm wrong; I'm no statistician.
Such a high false-positive may make it too easy to ignore turbo hipster
as the bot that cried wolf. This problem already exists with Jenkins and
the devstack/tempest tests; when one of those fails, I don't wonder what
I broke, but rather how many times I'll have to recheck the patch until
the tests pass.
Unfortunately, I don't have a solution to offer, but perhaps someone
else will.
More information about the OpenStack-dev
mailing list