[openstack-dev] [networking-sfc] Intermittent database transaction issues, affecting the tempest gate

Bernard Cafarelli bcafarel at redhat.com
Thu Jan 5 15:47:59 UTC 2017


After some research, this review fixes the tempest failures:
https://review.openstack.org/#/c/416503/1 (newer patchset has an
unrelated fix for the functional tests gate)

Multiple local tempest runs and gate rechecks all turned green with
this fix. That is the good news part.

The bad news is that I am still not sure on the root cause. The code
that triggers the problems is:
https://github.com/openstack/networking-sfc/blob/f5b52d5304796e44431b3874117aa0be91ed13d8/networking_sfc/services/sfc/drivers/ovs/db.py#L292
_get_port_detail() is just a wrapper on CommonDbMixin._get_by_id()
from neutron, so is it triggered by two _model_query() calls in a row?

Hoping someone can shed a light here, next time it may not be as an
easy fix as removing an unused line


On 22 December 2016 at 20:48, Mike Bayer <mbayer at redhat.com> wrote:
>
> On 12/20/2016 06:50 PM, Cathy Zhang wrote:
>>
>> Hi Bernard,
>>
>> Thanks for the email. I will take a look at this. Xiaodong has been
>> working on tempest test scripts.
>> I will work with Xiaodong on this issue.
>
>
> I've added a comment to the issue which refers to upstream SQLAlchemy issue
> https://bitbucket.org/zzzeek/sqlalchemy/issues/3803 as a potential
> contributor, though looking at the logs linked from the issue it appears
> that database deadlocks are also occurring which may also be a precursor
> here.   There are many improvements in SQLAlchemy 1.1 such that the
> "rollback()" state should not be as susceptible to a corrupted database
> connection as seems to be the case here.
>
>
>
>
>
>>
>> Cathy
>>
>>
>> -----Original Message-----
>> From: Bernard Cafarelli [mailto:bcafarel at redhat.com]
>> Sent: Tuesday, December 20, 2016 3:00 AM
>> To: OpenStack Development Mailing List
>> Subject: [openstack-dev] [networking-sfc] Intermittent database
>> transaction issues, affecting the tempest gate
>>
>> Hi everyone,
>>
>> we have an open bug (thanks Igor for the report) on DB transaction issues:
>> https://bugs.launchpad.net/networking-sfc/+bug/1630503
>>
>> The thing is, I am seeing  quite a few tempest gate failures that follow
>> the same pattern: at some point in the test suite, the service gets
>> warnings/errors from the DB layer (reentrant call, closed transaction,
>> nested rollback, …), and all following tests fail.
>>
>> This affects both master and stable/newton branches (not many changes for
>> now in the DB parts between these branches)
>>
>> Some examples:
>> * https://review.openstack.org/#/c/400396/ failed with console log
>>
>> http://logs.openstack.org/96/400396/2/check/gate-tempest-dsvm-networking-sfc-ubuntu-xenial/c27920b/console.html#_2016-12-16_12_44_47_564544
>> and service log
>>
>> http://logs.openstack.org/96/400396/2/check/gate-tempest-dsvm-networking-sfc-ubuntu-xenial/c27920b/logs/screen-q-svc.txt.gz?level=WARNING#_2016-12-16_12_44_32_301
>> * https://review.openstack.org/#/c/405391/ failed,
>>
>> http://logs.openstack.org/91/405391/2/check/gate-tempest-dsvm-networking-sfc-ubuntu-xenial/7e2b1de/console.html.gz#_2016-12-16_13_05_17_384323
>> and
>> http://logs.openstack.org/91/405391/2/check/gate-tempest-dsvm-networking-sfc-ubuntu-xenial/7e2b1de/logs/screen-q-svc.txt.gz?level=WARNING#_2016-12-16_13_04_11_840
>> * another on master branch: https://review.openstack.org/#/c/411194/
>> with
>> http://logs.openstack.org/94/411194/1/gate/gate-tempest-dsvm-networking-sfc-ubuntu-xenial/90633de/console.html.gz#_2016-12-15_22_36_15_216260
>> and
>> http://logs.openstack.org/94/411194/1/gate/gate-tempest-dsvm-networking-sfc-ubuntu-xenial/90633de/logs/screen-q-svc.txt.gz?level=WARNING#_2016-12-15_22_35_53_310
>>
>> I took a look at the errors, but only found old-and-apparently-fixed
>> pymysql bugs, and suggestions like:
>> *
>> http://docs.sqlalchemy.org/en/latest/faq/sessions.html#this-session-s-transaction-has-been-rolled-back-due-to-a-previous-exception-during-flush-or-similar
>> *  https://review.openstack.org/#/c/230481/
>> Not really my forte, so if someone could take a look at these logs and fix
>> the problem, it would be great! Especially with the upcoming multinode
>> tempest gate
>>
>> Thanks,
>> --
>> Bernard Cafarelli
>>
>> __________________________________________________________________________
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>> __________________________________________________________________________
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



-- 
Bernard Cafarelli



More information about the OpenStack-dev mailing list