[openstack-dev] Chaos drivers (idea)

Timothy Daly timjr at yahoo-inc.com
Mon May 6 18:05:38 UTC 2013


I think this is a great idea.  This type of thing is often called fuzz testing or fuzzing.  They use it a lot to test compilers, but I guess any sufficiently complicated interface is a good candidate.

In my limited experience with it, you get the most value when the behavior is nearly normal, right on the boundary of expected behavior and brokenness.  For the drivers, there are more ways to fail than just throwing.  For each API entry point, you could either do it, not do it, do something else, or do it but really slowly.  Note that doing the wrong thing is an especially hard case to deal with. They call that Byzantine failure.

You could get some of the benefit without even modifying the code by doing something like a Netflix chaos monkey -- just go kill off or pause processes at random while generating load.  You can also introduce network latency and packet loss using the linux network emulation.  For example, this adds a 300ms delay to lo, for testing on a single machine:

	sudo tc qdisc add dev lo root netem delay 300ms

Cheers,
Tim

On May 3, 2013, at 7:25 PM, Joshua Harlow wrote:

> Hi everyone,
> 
> I've been having an idea for a while but have never had time to get around to doing it, so I thought some other people might like to see if its possible (and or/comments in general).
> 
> The pitch:
> 
> It would be neat to implement a set of 'chaos' drivers that could be implemented for each pluggable 'driver-like' backend that can be provided in nova (for example the servicegroup driver, or the vm driver..). The concept would be that a single 'chaos' driver would wrap a working driver and the 'chaos' driver would randomly (likely via a specified seed and/or rate) raise different types of exceptions that the driver interface/abstraction allows to be thrown. For the cases where it does not throw it would pass on the request to the underlying driver, thus it would act as a driver that sporadically fails at a much higher rate (depending on the random seed or other configuration) than the underlying wrapped driver. This would be neat as a way to test the corner cases of the code using said driver (and to see how said code handles varying exceptions). It also provides a repeatable (due to the set seed)  & unique view into what state the system is left in after said exceptions occur. The results of such types of 'chaos' drivers could provide unique insights into how the overall system recovers from failures and could lead to new code to ensure the system is left in a consistent state (if it is not already).
> 
> If anyone might be interested I put this up @ https://blueprints.launchpad.net/nova/+spec/chaos-drivers
> 
> Thoughts or comments welcome :-)
> 
> -Josh
> 
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




More information about the OpenStack-dev mailing list