Open Stack

Sun Oct 20 19:03:07 UTC 2013

On 21 October 2013 07:36, Alex Gaynor <alex.gaynor at gmail.com> wrote:
> There's several issues involved in doing automated regression checking for
> benchmarks:
>
> - You need a platform which is stable. Right now all our CI runs on
> virtualized instances, and I don't think there's any particular guarantee
> it'll be the same underlying hardware, further virtualized systems tend to
> be very noisy and not give you the stability you need.
> - You need your benchmarks to be very high precision, if you really want to
> rule out regressions of more than N% without a lot of false positives.
> - You need more than just checks on individual builds, you need long term
> trend checking - 100 1% regressions are worse than a single 50% regression.

Let me offer a couple more key things:
 - you need a platform that is representative of your deployments:
1000 physical hypervisors have rather different checkin patterns than
1 qemu hypervisor.
 - you need a workload that is representative of your deployments:
10000 VM's spread over 500 physical hypervisors routing traffic
through one neutron software switch will have rather different load
characteristics than 5 qemu vm's in a kvm vm hosted all in one
configuration.

neither the platform - # of components, their configuration, etc, nor
the workload in devstack-gate are representative of production
deployments of any except the most modest clouds. Thats fine -
devstack-gate to date has been about base functionality, not digging
down into race conditions.

I think having a dedicated tool aimed at:
 - setting up *many different* production-like environments and running
 - many production-like workloads and
 - reporting back which ones work and which ones don't

makes a huge amount of sense.

from the reports from that tool we can craft targeted unit test or
isolated functional tests to capture the problem and prevent it
worsening or regressing (once fixed). See for instance Joe Gordons'
fake hypervisor which is great for targeted testing.

That said, I also agree with the sentiment expressed that the
workload-driving portion of Rally doesn't seem different enough to
Tempest to warrant being separate; it seems to me that Rally could be
built like this:

- a thing that does deployments spread out over a phase space of configurations
- instrumentation for deployments that permit the data visibility
needed to analyse problems
- tests for tempest that stress a deployment

So the single-button-push Rally would:
 - take a set of hardware
 - in a loop
 - deploy a configuration, run Tempest, report data

That would reuse Tempest and still be a single button push data
gathering thing, and if Tempest isn't capable of generating enough
concurrency/load [for a single test - ignore parallel execution of
different tests] then that seems like something we should fix in
Tempest, because concurrency/race conditions are things we need tests
for in devstack-gate.

-Rob

-- 
Robert Collins <rbtcollins at hp.com>
Distinguished Technologist
HP Converged Cloud

Open Stack

[openstack-dev] Announce of Rally - benchmarking system for OpenStack

OpenStack

Community

Documentation

Branding & Legal