[openstack-dev] testr help

Robert Collins robertc at robertcollins.net
Tue Mar 11 00:29:31 UTC 2014


On 11 March 2014 12:20, Zane Bitter <zbitter at redhat.com> wrote:

>> Except nose can make them all the same file descriptor and let
>> everything multiplex together. Nose isn't demuxing arbitrary numbers
>> of file descriptors from arbitrary numbers of processes.
>
>
> Can't each subunit process do the same thing?

Roughly yes.

> As a user, here's how I want it to work:
>  - Each subunit process works like nose - multiplexing the various streams
> of output together and associating it with a particular test - except that
> nothing is written to the console but instead returned to testr in subunit
> format.

subunit certainly supports that. We don't have glue in subunit.run to
intercept stdout and stderr automatically and do the association.
There are some nuances in getting that *right* that are fairly
important (such as not breaking tunnelled debugger use), but I'm very
open to the idea of such a thing existing. testtools.run shouldn't
multiplex like that (it interferes with pdb) but should perhaps permit
it.

>  - testr reads the subunit data and saves it to the test repository.

Which it does.

>  - testr prints a report to the console based on the data it just
> received/saved.

Which it does.

> How it actually seems to work:
>  - A magic pixie creates a TestCase class with a magic incantation to
> capture your stdout/stderr/logging without breaking other test runners.

testr doesn't actually have anything to do with this - its a
environment variable in .testr.conf that OpenStack uses. OpenStack has
a particularly egregious logging environment, and its inherited nose
based test had no friendly capturing/management of that.

>  - Or they don't! You're hosed. The magic incantation is undocumented.
>  - You change all of your TestCases to inherit from the class with the magic
> pixie dust.
>  - Each subunit process associates the various streams of output (if you set
> it up to) with a particular test, but keeps them separate so that if you
> want to figure out the order of events you have to direct them all to the
> same channel - which, in practice, means you can only use logging (since
> some of the events you are interested in probably already exist in the code
> as logs).

stdout is buffered by default, stderr is unbuffered by default,
logging is on the other side of a mutex - if you have any concurrency
going on in your test process (which many do), there is absolutely no
guarantee of relative ordering between different sources unless that's
done in the generating process - something subunit.run may be able to
help with (see above).

>  - when you want to debug a test, you have to all the tedious loigging setup
> if it doesn't already exist in the file. It probably won't, because flake8
> would have made you delete it unless it's being used already.
>  - testr reads the subunit data and saves it to the test repository.
>  - testr prints a report to the console based on the data it just
> received/saved, though parts of it look like a raw data dump.

Which bits look raw? It should only show text/* attachments, non-text
should be named but not dumped.

> While there may be practical reasons why it currently works like the latter,
> I would submit that there is no technical reason it could not work like the
> former. In particular, there is nothing about the concept of running the
> tests in parallel that would prevent it, just as there is nothing about what
> nose does that would prevent two copies of nose from running at the same
> time on different sets of tests.

The key bit that isn't visible isn't implemented yet, which is a
desire to demultiplex stdin from the testr console to the backends, to
permit pdb usage in tests. It works in single-worker mode today
(across processes) but not in multi-worker mode.

> It just seems bizarre to me that the _tests_ have to figure out what test
> runner they are being run by and redirect their output to the correct
> location to oblige it. Surely the point of a test runner is to do the Right
> Thing(TM) for any test, regardless of what it knows about the test runner.

Test runners should provide a homogeneous, consistent environment for
tests - for sure.

>>> BTW, since I'm on the subject, testr would be a lot more
>>> confidence-inspiring if running `testr failing` immediately after running
>>> `testr` reported the same number of failures, or indeed if running
>>> `testr`

It should. There as a double-accounting bug in testtools some months
back, but you should get the same failure count in testr last as from
testr run, since it pulls from the same data.

> It makes sense for the case where the test runner has died without reporting
> data, but why should it be reported as a separate failure when it has
> reported data that testr has regarded as valid enough to use and display?

testr synthesises failed tests for a backend in two cases:
a) if a test starts but doesn't finish, that is presumed to be a
backend failure regardless of backend exit code (e.g. because of code
that calls 'sys.exit()' or 'os._exit()' in the middle of a test). This
is attached to the test id, so no new test id is seen by the user.
b) if there is no test active on the backend, but the process exits
non-zero, then the *stream generation* is presumed to have failed
(e.g. due to a segfault when loading a test module or some such), and
a process failure is created.

It looks like we have a regression in a relatively recent refactoring
in subunit.run to use more testtools.run code, where now testtools is
exiting non-zero after successfully executing the tests, and
subunit.run is not squelching that on the basis that the subunit
stream has the semantic information for the test runner. Please do
file a bug.

> My machine has 4 cores with hyperthreads (a common configuration, I would
> imagine), so I have to do this mental translation every time:
>
> 1    failure  -> everything failed
> 2    failures -> 1 failure
> 3    failures -> 2 failures
> 4    failures -> 2-3 failures
> 5    failures -> 3-4 failures
> 6    failures -> 3-5 failures
> 7    failures -> 4-6 failures
> 8    failures -> 4-7 failures
> 9    failures -> 5-8 failures
> 10   failures -> 5-9 failures
> 11   failures -> 6-10 failures
> 12   failures -> 6-11 failures
> 13   failures -> 7-12 failures
> 14   failures -> 7-13 failures
> 15   failures -> 8-14 failures
> 16   failures -> 8-15 failures
> 17   failures -> 9-16 failures
> 18   failures -> 10-17 failures
> n>18 failures -> (n-8)-(n-1) failures
>
> This means that the change statistic that testr prints is completely useless
> for determining what I really needed to know, which is whether the thing I
> just modified fixed any tests or not (or, alternatively, whether I have an
> unstable test). For low-ish numbers of test failures (say <15), it's
> dominated by random noise. Call me old-fashioned, but random noise is not
> the distribution I'm looking for in reports from my test runner :p
>
>
>>> (I understand that some of the things mentioned above may already have
>>> been
>>> improved since the latest 0.0.18 release. I can't actually find the repo
>>> at
>>> the moment to check, because it's not linked from PyPI or Launchpad and
>>> 'testrepository' turns out to be an extremely common name for things on
>>> the
>>> Internet.)
>>>
>> https://launchpad.net/testrepository is linked from pypi....
>
>
> Ah, thanks. I looked at that page, but in my head I thought I had once seen
> a Git repo for it and I couldn't see a link from there... I never guessed
> that the upstream was in Launchpad's bzr.

It's a tuit to move to github, I've actually got a fastimport stream
I'm probably going to push when I get a little more me time.

-Rob

> cheers,
> Zane.
>
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



-- 
Robert Collins <rbtcollins at hp.com>
Distinguished Technologist
HP Converged Cloud



More information about the OpenStack-dev mailing list