[glance] functional-py35 job fails due to subunit.praser error

Clark Boylan cboylan at sapwetik.org
Thu Dec 13 18:01:06 UTC 2018


On Thu, Dec 13, 2018, at 9:38 AM, Abhishek Kekane wrote:
> Hi All,
> 
> From 7th December 2018, all of sudden functional-py35 check job is failing
> on current master. Earlier this job was failing intermittently (2-3 times
> from 100). I have tried to dig into the issue and find out that it is
> failing obly for ubuntu and working fine with Fedora, centos etc. Below are
> some of the findings:
> 
> 1. Even if I add failure tests [1] in black-list file every time other
> tests reported as failure.
> 2. When I compared failed tests results [1] with passed tests results [2] I
> have found that passing tests shows 'Pass 741 Skip 7' whereas failing tests
> shows 'Pass 627 Failure 9 Skip 5' (it does not execute all the tests in
> case of failure)
> 3. I have compared version of subunit, oslo.service, oslo.log, tox with
> failed [3] and passed [4] logs and they are same.
> 4. I came across a bug in cinder [5] which talks about excessive logs might
> cause this error and I have tried to ignore the deprecation warning in
> tox.ini by adding 'PYTHONWARNINGS=ignore::DeprecationWarning' but it still
> shows DeprecationWarning in the logs.
> 5. I have tried executing tests by setting '--concurrency' to 1 but still
> the error persists.
> 6. Whenver subunit.parser error occurs the tests remains 'inprogress' state
> and sometimes time-outs.
> 
> I have no clues so far to tackle this issue, kindly provide your inputs for
> the same.

Usually I like to start with the subunit data itself [6]. You'll want to download this file locally, then unzip it. At this point it is in subunitv2 protocol format which is binary and not very human readable. If you install the python subunit package you get a tool called `subunit-2to1` that you can run on this file to get v1 protocol format which is human readable.

Doing this we get:

tags: worker-3
test: subunit.parser
time: 2018-12-13 03:36:22.466478Z
failure: subunit.parser [ multipart
Content-Type: application/octet-stream
Packet data
6
�+p�v�0
Content-Type: text/plain;charset=utf8
Parser Error
31
Short read - got 65530 bytes, wanted 423557 bytes0
]
tags: -worker-3
time: 2018-12-13 03:36:17.372762Z

The control process got fewer bytes from worker-3 than it expected. One reason this could happen is the worker-3 test process is crashing before writing all the bytes, but there is at least one test on worker-3 that reports after this parse error so that may not be the cause.

Another potential cause is subunit + testr relies on stdout and subunt + stestr relies on tcp socket (I think) and it could be that that stream is being closed by side effects in a test?

To narrow this down I would use process of elimination to identify suspect test cases. Basically any test that doesn't show up in that subunit log could be causing this. Then run them in isolation to look for side effect causing errors. You may also want to cross check against several failures which may help narrow it down further.

> 
> 
> [1]
> http://logs.openstack.org/89/617889/4/check/openstack-tox-functional-py35/4c79eb0/testr_results.html.gz
> [2]
> http://logs.openstack.org/89/617889/4/check/openstack-tox-functional-py35/39f2c7e/testr_results.html.gz
> [3]
> http://logs.openstack.org/89/617889/4/check/openstack-tox-functional-py35/4c79eb0/job-output.txt.gz
> [4]
> http://logs.openstack.org/89/617889/4/check/openstack-tox-functional-py35/39f2c7e/job-output.txt.gz
> [5] https://bugs.launchpad.net/cinder/+bug/1728640

[6] http://logs.openstack.org/89/617889/4/check/openstack-tox-functional-py35/4c79eb0/testrepository.subunit.gz



More information about the openstack-discuss mailing list