[openstack-dev] [nova][oslo][openstack-ansible] DB deadlocks, Mitaka, and You

Ian Cordasco sigmavirus24 at gmail.com
Wed Oct 19 12:36:58 UTC 2016


-----Original Message-----
From: Carter, Kevin <kevin at cloudnull.com>
Reply: OpenStack Development Mailing List (not for usage questions)
<openstack-dev at lists.openstack.org>
Date: October 18, 2016 at 21:18:13
To: OpenStack Development Mailing List (not for usage questions)
<openstack-dev at lists.openstack.org>
Subject:  [openstack-dev] [nova][oslo][openstack-ansible] DB
deadlocks, Mitaka, and You

> Hello all,
>
> As some folks may know the OSIC cloud1 just upgraded to Mitaka last
> week (I know what's old is new again, sorry). Since the upgrade things
> have been running fairly smoothly however there's been one issue that
> has left me scratching my head. When attempting a scale out test we've
> run into issues where the nova client was returning a message
> indicating it had encountered a DB deadlock [0]. In attempting to gain
> more information we enabled debug logging and collected the following
> [1] (Here is a pretty version for the tracebacks [4]). This is the
> command we're using to build all of the VMs [2] which was being used
> to build 3 vms per compute node for 242 compute nodes. Once the
> instances are all online we grab the nodes IPv6 address and ensure
> we're able to SSH to them. While we're happy to report that almost all
> of the VMs came online in one shot we did run into a few of these DB
> dead lock messages and would like to see if folks out on the interwebs
> have seen or experienced such problems. If so we'd love to know if
> there's some remediation that we can use to make this all even
> happier. It should be noted that this is not a Major issue at this
> point, 3 out of 726 VMs didn't come online due to the problem but it
> would be amazing if there was something we could do to generally
> resolve this.
>
> Other potentially interesting things:
> * DB is MariaDB "mariadb-galera-server-10.0". The replication system
> is using xtrabackup from percona with version "2.3.5-1.trusty".
> * The DB is a 3 node cluster however the connection to the cluster
> is using a VIP on our loadbalancer and the services are only ever
> connected to 1 of the three nodes; this is for both reads and writes.
> Should a node become un-happy the load balancer promotes and demotes
> always ensuring only 1 node is being connected to.
> * While reproducing this issue repeatedly we've watched wsrep to see
> if nodes we're dropping or otherwise having a bad day. To our dismay
> there were no un-happy nodes and the wsrep state seemed to remain
> OPERATIONAL with minimal latency (Example [3]).
> * For all of the OpenStack Services we use pymsql as it's DB driver
> we were using mysql-python but I believe OpenStack-Ansible switched in
> kilo due to DB deadlock issues we were experiencing.
> * Cloud1 is running nova at commit
> "dd30603f91e6fd3d1a4db452f20a51ba8820e1f4" which was the HEAD of
> "stable/mitaka" on 29-09-2016. This code is in production today with
> absolutely no modifications.
>
> Any insight would be greatly appreciated. Thanks again everyone!
>
> [0] http://paste.openstack.org/show/586302/
> [1] http://paste.openstack.org/show/586309/
> [2] http://paste.openstack.org/show/586298/
> [3] http://paste.openstack.org/show/586306/
> [4] http://paste.openstack.org/show/586307/

Hey Kevin,

So just looking at the pastes you have here, I'm inclined to believe
this is actually a bug in oslo_db/sqlalchemy. If you follow the trace,
there's a PyMySQL InternalError not being handled inside of
sqlalchemy. I'm not sure if SQLAlchemy considers InternalErrors to be
something it cannot retry, or something that the user should decide
how to handle, but I would start chatting with the folk who work on
oslo_db and SQLAlchemy in the community.

That said, this also looks like something that should be reported to
Nova. Something causing an unhandled exception is definitely bug worth
(even if the fix belongs somewhere in one of its dependencies).

--
Ian Cordasco



More information about the OpenStack-dev mailing list