Open Stack

Sun Jan 12 07:47:42 UTC 2014

On Wed, Jan 8, 2014 at 10:57 PM, Sean Dague <sean at dague.net> wrote:

[snip]

> So instead of trying to fix the individual runs, because t-h runs pretty
> fast, can you just fix it with bulk. It seems like the issue in a migration
> taking a long time isn't a race in OpenStack, it's completely variability in
> the underlying system.
>
> And it seems that the failing case is going to be 100% repeatable, and
> infrequent.
>
> So it seems like you could solve the fail side by only reporting fail
> results on 3 fails in a row: RESULT && RESULT && RESULT
>
> Especially valid if Results are coming from different AZs, so any local
> issues should be masked.

Whilst this is true, I worry about codifying flakiness in tests (as
shown by the gate experience). Instead I'm working on the root causes
of the flakiness.

I've done some work this week on first order metrics for migration
expense (IO ops per migration) instead of second order metrics (wall
time), so I am hoping this will help once deployed.

Michael

-- 
Rackspace Australia

Open Stack

[openstack-dev] [nova] Bogus -1 scores from turbo hipster

OpenStack

Community

Documentation

Branding & Legal