Open Stack

Mon Oct 21 14:23:40 UTC 2013

On 10/20/2013 02:36 PM, Alex Gaynor wrote:
> There's several issues involved in doing automated regression checking
> for benchmarks:
>
> - You need a platform which is stable. Right now all our CI runs on
> virtualized instances, and I don't think there's any particular
> guarantee it'll be the same underlying hardware, further virtualized
> systems tend to be very noisy and not give you the stability you need.
> - You need your benchmarks to be very high precision, if you really want
> to rule out regressions of more than N% without a lot of false positives.
> - You need more than just checks on individual builds, you need long
> term trend checking - 100 1% regressions are worse than a single 50%
> regression.
>
> Alex

Agreed on all these points. However I think non of them change where the 
load generation scripts should be developed.

They mostly speak to ensuring that we've got a repeatable hardware 
environment for running the benchmark, and that we've got the right kind 
of data collection and analysis to make it stastically valid.

Point #1 is hard - as it really does require bare metal. But lets put 
that asside for now, as I think there might be clouds being made 
available that we could solve that.

But the rest of this is just software. If we had performance metering 
available in either the core servers or as part of Tempest we could get 
appropriate data. Then you'd need a good statistics engine to provide 
statisically relevant processing of that data. Not just line graphs, but 
real error bars and confidence intervals based on large numbers of runs. 
I've seen way too many line graphs arguing one point or another about 
config changes that turns out have error bars far beyond the results 
that are being seen. Any system that doesn't expose that isn't really 
going to be useful.

Actual performance regressions are going to be *really* hard to find in 
the gate, just because of the rate of code change that we have, and the 
variability we've seen on the guests.

Honestly, the statistics engine that actually just took in our existing 
large sets of data and got baseline variability would be a great step 
forward (that's new invention, no one has that right now). I'm sure we 
can figure out a good way to take the load generation into Tempest to be 
consistent with our existing validation and scenario tests. The metering 
could easily be proposed as a nova extension (ala coverage). And that 
seems to leave you with a setup tool, to pull this together in arbitrary 
environments.

And that's really what I mean about integrating better. Whenever 
possible figuring out how functionality could be added to existing 
projects, especially when that means they are enhanced not only for your 
use case, but for other use cases that those projects have wanted for a 
while (seriously, I'd love to have statistically valid run time 
statistics for tempest that show us when we go off the rails, like we 
did last week for a few days, and quantify long term variability and 
trends in the stack). It's harder in the short term to do that, because 
it means compromises along the way, but the long term benefit to 
OpenStack is much greater than another project which duplicates effort 
from a bunch of existing projects.

	-Sean

-- 
Sean Dague
http://dague.net

Open Stack

[openstack-dev] Announce of Rally - benchmarking system for OpenStack

OpenStack

Community

Documentation

Branding & Legal