[openstack-dev] Announce of Rally - benchmarking system for OpenStack
Sean Dague
sean at dague.net
Mon Oct 21 14:23:40 UTC 2013
On 10/20/2013 02:36 PM, Alex Gaynor wrote:
> There's several issues involved in doing automated regression checking
> for benchmarks:
>
> - You need a platform which is stable. Right now all our CI runs on
> virtualized instances, and I don't think there's any particular
> guarantee it'll be the same underlying hardware, further virtualized
> systems tend to be very noisy and not give you the stability you need.
> - You need your benchmarks to be very high precision, if you really want
> to rule out regressions of more than N% without a lot of false positives.
> - You need more than just checks on individual builds, you need long
> term trend checking - 100 1% regressions are worse than a single 50%
> regression.
>
> Alex
Agreed on all these points. However I think non of them change where the
load generation scripts should be developed.
They mostly speak to ensuring that we've got a repeatable hardware
environment for running the benchmark, and that we've got the right kind
of data collection and analysis to make it stastically valid.
Point #1 is hard - as it really does require bare metal. But lets put
that asside for now, as I think there might be clouds being made
available that we could solve that.
But the rest of this is just software. If we had performance metering
available in either the core servers or as part of Tempest we could get
appropriate data. Then you'd need a good statistics engine to provide
statisically relevant processing of that data. Not just line graphs, but
real error bars and confidence intervals based on large numbers of runs.
I've seen way too many line graphs arguing one point or another about
config changes that turns out have error bars far beyond the results
that are being seen. Any system that doesn't expose that isn't really
going to be useful.
Actual performance regressions are going to be *really* hard to find in
the gate, just because of the rate of code change that we have, and the
variability we've seen on the guests.
Honestly, the statistics engine that actually just took in our existing
large sets of data and got baseline variability would be a great step
forward (that's new invention, no one has that right now). I'm sure we
can figure out a good way to take the load generation into Tempest to be
consistent with our existing validation and scenario tests. The metering
could easily be proposed as a nova extension (ala coverage). And that
seems to leave you with a setup tool, to pull this together in arbitrary
environments.
And that's really what I mean about integrating better. Whenever
possible figuring out how functionality could be added to existing
projects, especially when that means they are enhanced not only for your
use case, but for other use cases that those projects have wanted for a
while (seriously, I'd love to have statistically valid run time
statistics for tempest that show us when we go off the rails, like we
did last week for a few days, and quantify long term variability and
trends in the stack). It's harder in the short term to do that, because
it means compromises along the way, but the long term benefit to
OpenStack is much greater than another project which duplicates effort
from a bunch of existing projects.
-Sean
--
Sean Dague
http://dague.net
More information about the OpenStack-dev
mailing list