[openstack-dev] [qa] moratorium on new negative tests in Tempest

pcrews gleebix at gmail.com
Tue Nov 12 22:03:19 UTC 2013


On 11/12/2013 12:20 PM, Monty Taylor wrote:
>
>
> On 11/12/2013 02:33 PM, David Kranz wrote:
>> On 11/12/2013 01:36 PM, Clint Byrum wrote:
>>> Excerpts from Sean Dague's message of 2013-11-12 10:01:06 -0800:
>>>> During the freeze phase of Havana we got a ton of new contributors
>>>> coming on board to Tempest, which was super cool. However it meant we
>>>> had this new influx of negative tests (i.e. tests which push invalid
>>>> parameters looking for error codes) which made us realize that human
>>>> creation and review of negative tests really doesn't scale. David Kranz
>>>> is working on a generative model for this now.
>>>>
>>> Are there some notes or other source material we can follow to understand
>>> this line of thinking? I don't agree or disagree with it, as I don't
>>> really understand, so it would be helpful to have the problems enumerated
>>> and the solution hypothesis stated. Thanks!
>>>
>>> _______________________________________________
>>> OpenStack-dev mailing list
>>> OpenStack-dev at lists.openstack.org
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>> I am working on this with Marc Koderer but we only just started and are
>> not quite ready. But since you asked now...
>>
>> The problem is that the current implementation of negative tests is that
>> each "case" is represented as code in a method and targets a particular
>> set of api arguments and expected result. In most (but not all) of these
>> tests there is boilerplate code surrounding the real content which is
>> the actual arguments being passed and the value expected. That
>> boilerplate code has to be written correctly and reviewed. The general
>> form of the solution has to be worked out but basically would involve
>> expressing these tests declaratively, perhaps in a yaml file. In order
>> to do this we will need some kind of json schema for each api. The main
>> implementation around this is defining the yaml attributes that make it
>> easy to express the test cases, and somehow coming up with the json
>> schema for each api.
>>
>> In addition, we would like to support "fuzz testing" where arguments
>> are, at least partially, randomly generated and the return values are
>> only examined for 4xx vs something else. This would be possible if we
>> had json schemas. The main work is to write a generator and methods for
>> creating bad values including boundary conditions for types with ranges.
>> I had thought a bit about this last year and poked around for an
>> existing framework. I didn't find anything that seemed to make the job
>> much easier but if any one knows of such a thing (python, hopefully)
>> please let me know.
>>
>> The negative tests for each api would be some combination of
>> declaratively specified cases and auto-generated ones.
>>
>> With regard to the json schema, there have been various attempts at this
>> in the past, including some ideas of how wsme/pecan will help, and it
>> might be helpful to have more project coordination. I can see a few
>> options:
>>
>> 1. Tempest keeps its own json schema data
>> 2. Each project keeps its own json schema in a way that supports
>> automated extraction
>> 3. There are several use cases for json schema like this and it gets
>> stored in some openstacky place that is not in tempest
>>
>> So that is the starting point. Comments and suggestions welcome! Marc
>> and I just started working on an etherpad
>> https://etherpad.openstack.org/p/bp_negative_tests but any one is
>> welcome to contribute there.
>
> We actually did this back in the good old Drizzle days- and by we, I
> mean Patrick Crews, who I copied here. He can refer to the research
> better than I can, but AIUI, generative schema-driven testing of things
> like this is certainly the right direction. It's about 10 years behind
> the actual state of the art of the research, but it's in all ways
> superior to making human combinations of input parameters and output
> behaviors.
>
Thanks, Monty.
As Monty has stated, similar issues have been encountered in database 
testing.
They are also complex, richly features systems that present interesting 
testing challenges.
The best research regarding stochastic / randomized / high-volume / 
machine-generated test cases that I have seen has come from Microsoft's 
SQL Server team and it is this research that informed the creation of 
the random query generator tool for MySQL systems.
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.97.3435&rep=rep1&type=pdf 
<-- MS paper on their db testing tools

We've been doing similar work with the test suite for libra - we 
organize things by api actions and define validation code (if name > 
max_len, we expect return value NNN, if user=bad, we expect MMM, etc) 
We have singular test cases (create_lb, update_lb, update_lb_nodes) and 
we feed in various parameters (names, number of nodes, etc) to produce 
several iterations of the test w/ different inputs.

This allows us to have one chunk of code that appropriately describes 
the api action's behavior while letting us quickly make new tests for 
that action by simply creating a new yaml file or adding to an existing one.

Some background:
Basically testing complex systems presents a couple of main problems 
(depth / interestingness of tests + maintainenance)
People are not that good at writing and validating super complicated, 
insane tests by hand / eyeball.  That is, people only often go so deep 
as time, energy, and brainpower permit (someone maybe won't create a 
tempest test w/ 200 steps and someone won't create a 20 table, 200 line 
SQL query for databases).

Throwing more people at testing generally only results in a ton of 
shallow test cases that you must also now maintain (keep up to date, 
investigate failures on, etc).  If a company like MS found it could not 
feasibly scale w/ its resources, it should provide food for thought on 
OpenStack's testing strategy.

random query generator:
As a solution to this, one of my former colleagues created a testing 
tool called the random query generator (randgen).  Instead of defining 
individual queries and their expected results (human validation of such 
things is also a time-sink / hell-hole), we instead move to defining 
stochastic grammars that express what components a query *may* have and 
we let the code and RNG do its thing for generating tests.

I will only say that this tool helped kill MySQL 6.0 and is heavily 
relied on by Percona, MariaDB, etc once their QA guys realized it was 
the only way to not be crushed under their own weight .  It was like 
being handed a Zippo after relying on sticks and rocks to make fire...

Hope this information is useful and please feel free to ping me if 
anyone has further questions / wants to discuss this.

--
Thanks,
Patrick




More information about the OpenStack-dev mailing list