[openstack-dev] [oslo.db] [release] opportunistic tests breaking randomly

Mike Bayer mbayer at redhat.com
Thu Sep 15 03:05:43 UTC 2016


There's a different set of logs attached to the launchpad issue, that's 
not what I was looking at before.

These logs are at 
http://logs.openstack.org/90/369490/1/check/gate-nova-tox-db-functional-ubuntu-xenial/085ac3e/console.html#_2016-09-13_14_54_18_098031 
.    In these logs, I see something *very* different, not just the MySQL 
tests but the Postgresql tests are definitely hitting conflicts against 
the randomly generated database.

This set of traces, e.g.:

sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) database 
"dbzrtmgbxv" is being accessed by other users
2016-09-13 14:54:18.093723 |     DETAIL:  There is 1 other session using 
the database.
2016-09-13 14:54:18.093736 |      [SQL: 'DROP DATABASE dbzrtmgbxv']

and

File 
"/home/jenkins/workspace/gate-nova-tox-db-functional-ubuntu-xenial/.tox/functional/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", 
line 668, in _rollback_impl
2016-09-13 14:54:18.095470 | 
self.engine.dialect.do_rollback(self.connection)
2016-09-13 14:54:18.095513 |       File 
"/home/jenkins/workspace/gate-nova-tox-db-functional-ubuntu-xenial/.tox/functional/local/lib/python2.7/site-packages/sqlalchemy/engine/default.py", 
line 420, in do_rollback
2016-09-13 14:54:18.095526 |         dbapi_connection.rollback()
2016-09-13 14:54:18.095548 |     sqlalchemy.exc.InterfaceError: 
(psycopg2.InterfaceError) connection already closed

are a very different animal. For one thing, they're on Postgresql where 
the driver and DB acts extremely rationally.   For another, there's no 
timeout exception here, and not all the conflicts are within the teardown.

Are *these* errors also new as of version 4.13.3 of oslo.db ?   Because 
here I have more suspicion of one particular oslo.db change here.







  fits much more with your initial description

On 09/14/2016 10:48 PM, Mike Bayer wrote:
>
>
> On 09/14/2016 07:04 PM, Alan Pevec wrote:
>>> Olso.db 4.13.3 did hit the scene about the time this showed up. So I
>>> think we need to strongly consider blocking it and revisiting these
>>> issues post newton.
>>
>> So that means reverting all stable/newton changes, previous 4.13.x
>> have been already blocked https://review.openstack.org/365565
>> How would we proceed, do we need to revert all backport on stable/newton?
>
> In case my previous email wasn't clear, I don't *yet* see evidence that
> the recent 4.13.3 release of oslo.db is the cause of this problem.
> However, that is only based upon what I see in this stack trace, which
> is that the test framework is acting predictably (though erroneously)
> based on the timeout condition which is occurring.   I don't (yet) see a
> reason that the same effect would not occur prior to 4.13.3 in the face
> of a signal pre-empting the work of the pymysql driver mid-stream.
> However, this assumes that the timeout condition itself is not a product
> of the current oslo.db version and that is not known yet.
>
> There's a list of questions that should all be answerable which could
> assist in giving some hints towards this.
>
> There's two parts to the error in the logs.  There's the "timeout"
> condition, then there is the bad reaction of the PyMySQL driver and the
> test framework as a result of the operation being interrupted within the
> test.
>
> * Prior to oslo.db 4.13.3, did we ever see this "timeout" condition
> occur?   If so, was it also accompanied by the same "resource closed"
> condition or did this second part of the condition only appear at 4.13.3?
>
> * Did we see a similar "timeout" / "resource closed" combination prior
> to 4.13.3, just with less frequency?
>
> * Was the version of PyMySQL also recently upgraded (I'm assuming this
> environment has been on PyMySQL for a long time at this point) ?   What
> was the version change if so?  Especially if we previously saw "timeout"
> but no "resource closed", perhaps an older version pf PyMySQL didn't
> react in this way?
>
> * Was the version of MySQL running in the CI environment changed?   What
> was the version change if so?    Were there any configurational changes
> such as transaction isolation, memory or process settings?
>
> * Have there been changes to the "timeout" logic itself in the test
> suite, e.g. whatever it is that sets up fixtures.Timeout()?  Or some
> change that alters how teardown of tests occurs when a test is
> interrupted via this timeout?
>
> * What is the magnitude of the "timeout" this fixture is using, is it on
> the order of seconds, minutes, hours?
>
> * If many minutes or hours, can the test suite be observed to be stuck
> on this test?   Has someone tried to run a "SHOW PROCESSLIST" while this
> condition is occurring to see what SQL is pausing?
>
> * Has there been some change such that the migration tests are running
> against non-empty tables or tables with much more data than was present
> before?
>
> * Is this failure only present within the Nova test suite or has it been
> observed in the test suites of other projects?
>
> * Is this failure present only on the "database migration" test suite or
> is it present in other opportunistic tests, for Nova and others?
>
> * Have there been new database migrations added to Nova which are being
> exercised here and may be involved?
>
> I'm not sure how much of an inconvenience it is to downgrade oslo.db. If
> downgrading it is feasible, that would at least be a way to eliminate it
> as a possibility if these same failures continue to occur, or a way to
> confirm its involvement if they disappear.   But if downgrading is
> disruptive then there are other things to look at in order to have a
> better chance at predicting its involvement.
>
>
>
>>
>> Cheers,
>> Alan
>>
>> __________________________________________________________________________
>>
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe:
>> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>



More information about the OpenStack-dev mailing list