[openstack-dev] [cinder] [oslo] MySQL connection shared between green threads concurrently
Doug Hellmann
doug at doughellmann.com
Tue Feb 17 15:30:15 UTC 2015
On Mon, Feb 16, 2015, at 07:50 PM, Mike Bayer wrote:
> hi all -
>
> I’ve been researching this cinder issue
> https://bugs.launchpad.net/cinder/+bug/1417018 and I’ve found something
> concerning.
>
> Basically there’s a race condition here which is occurring because a
> single MySQL connection is shared between two green threads. This occurs
> while Cinder is doing its startup work in cinder/volume/manager.py ->
> init_host(), and at the same time a request comes in from a separate
> service call that seems to be part of the startup.
>
> The log output at http://paste.openstack.org/show/175571/ shows this
> happening. I can break it down:
>
>
> 1. A big query for volumes occurs as part of
> "self.db.volume_get_all_by_host(ctxt, self.host)”. To reproduce the
> error more regularly I’ve placed it into a loop of 100 calls. We can see
> that thread id is 68089648 MySQL connection is 3a9c5a0.
>
> 2015-02-16 19:32:47.236 INFO sqlalchemy.engine.base.Engine
> [req-ed3c0248-6ee5-4063-80b5-77c5c9a23c81 None None] tid: 68089648,
> connection: <_mysql.connection open to '127.0.0.1' at 3a9c5a0>, stmt
> SELECT volumes.created_at AS
> 2015-02-16 19:32:47.237 INFO sqlalchemy.engine.base.Engine
> [req-ed3c0248-6ee5-4063-80b5-77c5c9a23c81 None None]
> ('localhost.localdomain at ceph', 'localhost.localdomain at ceph#%’)
>
> 2. A “ping” query comes in related to a different API call - different
> thread ID, *same* connection
>
> 2015-02-16 19:32:47.276 INFO sqlalchemy.engine.base.Engine
> [req-600ef638-cb45-4a34-a3ab-6d22d83cfd00 None None] tid: 68081456,
> connection: <_mysql.connection open to '127.0.0.1' at 3a9c5a0>, stmt
> SELECT 1
> 2015-02-16 19:32:47.279 INFO sqlalchemy.engine.base.Engine
> [req-600ef638-cb45-4a34-a3ab-6d22d83cfd00 None None] ()
>
> 3. The first statement is still in the middle of invocation, so we get a
> failure, either a mismatch of the statement to the cursor, or a MySQL
> lost connection (stack trace begins)
>
> Traceback (most recent call last):
> File "/usr/lib/python2.7/site-packages/eventlet/hubs/hub.py", line 457,
> in fire_timers
> … more stack trace
>
> 4. another thread id, *same* connection.
>
> 2015-02-16 19:32:47.290 INFO sqlalchemy.engine.base.Engine
> [req-f980de7c-151d-4fed-b45e-d12b133859a6 None None] tid: 61238160,
> connection: <_mysql.connection open to '127.0.0.1' at 3a9c5a0>, stmt
> SELECT 1
>
> rows = [process[0](row, None) for row in fetch]
> File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/loading.py",
> line 363, in _instance
> tuple([row[column] for column in pk_cols])
> File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/result.py",
> line 331, in _key_fallback
> expression._string_or_unprintable(key))
> NoSuchColumnError: "Could not locate column in row for column
> 'volumes.id'"
> 2015-02-16 19:32:47.293 ERROR cinder.openstack.common.threadgroup [-]
> "Could not locate column in row for column 'volumes.id’"
>
>
> So I’m not really sure what’s going on here. Cinder seems to have some
> openstack greenlet code of its own in
> cinder/openstack/common/threadgroup.py, I don’t know the purpose of this
> code. SQLAlchemy’s connection pool has been tested a lot with eventlet
> / gevent and this has never been reported before. This is a very
> severe race and I’d think that this would be happening all over the
> place.
The threadgroup module is from the Oslo incubator, so if you need to
review the git history you'll want to look at that copy.
Doug
>
> Current status is that I’m continuing to try to determine why this is
> happening here, and seemingly nowhere else.
>
>
>
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe:
> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
More information about the OpenStack-dev
mailing list