[openstack-dev] [all] [oslo] [ha] potential issue with implicit async-compatible mysql drivers

Mike Bayer mbayer at redhat.com
Sat Dec 13 18:43:11 UTC 2014

> On Dec 12, 2014, at 1:16 PM, Mike Bayer <mbayer at redhat.com> wrote:
>> On Dec 12, 2014, at 9:27 AM, Ihar Hrachyshka <ihrachys at redhat.com> wrote:
>> Hash: SHA512
>> Reading the latest comments at
>> https://github.com/PyMySQL/PyMySQL/issues/275, it seems to me that the
>> issue is not to be solved in drivers themselves but instead in
>> libraries that arrange connections (sqlalchemy/oslo.db), correct?
>> Will the proposed connection reopening help?
> disagree, this is absolutely a driver bug.  I’ve re-read that last comment and now I see that the developer is suggesting that this condition not be flagged in any way, so I’ve responded.  The connection should absolutely blow up and if it wants to refuse to be usable afterwards, that’s fine (it’s the same as MySQLdb “commands out of sync”).  It just has to *not* emit any further SQL as though nothing is wrong.
> It doesn’t matter much for PyMySQL anyway, I don’t know that PyMySQL is up to par for openstack in any case (look at the entries in their changelog: https://github.com/PyMySQL/PyMySQL/blob/master/CHANGELOG "Several other bug fixes”, “Many bug fixes"- really?  is this an iphone app?)
> We really should be looking to get this fixed in MySQL-connector, which seems to have a similar issue.   It’s just so difficult to get responses from MySQL-connector that the PyMySQL thread is at least informative.

so I spent the rest of yesterday continuing to stare at that example case and also continued the thread on that list.

Where I think it’s at is that, while I think this is a huge issue in any one or all of:  1. a gevent-style “timeout” puts a monkeypatched socket in an entirely unknown state, 2. MySQL’s protocol doesn’t have any provision for matching an OK response to the request that it corresponds to, 3. the MySQL drivers we’re dealing with don’t have actual “async” APIs, which could then be easily tailored to work with eventlet/gevent safely (see https://github.com/zacharyvoase/gevent-psycopg2 https://bitbucket.org/dvarrazzo/psycogreen for the PG examples of these, problem solved), at the moment I’m not fully confident the drivers are going to feasibly be able to provide a complete fix here.     MySQL sends a status message that is essentially, “OK”, and there’s not really any way to tell that this “OK” is actually from a different statement.

What we need at the very basic level is that, if we call connection.rollback(), it either fails with an exception, or it succeeds.   Right now, the core of the test case is that we see connection.rollback() silently failing, which then causes the next statement (the INSERT) to also fail - then the connection rights itself and continues to be usable to complete the transaction.   There might be some other variants of this.

So in the interim I have added for SQLA 0.9.9, which I can also make available as part of oslo.db.sqlalchemy.compat if we’d like, a session.invalidate() method that will just call connection.invalidate() on the current bound connection(s); this is then caught within the block where we know that eventlet/gevent is in a “timeout” status.

Within the oslo.db.sqlalchemy.enginefacade system, we can potentially add direct awareness of eventlet.Timeout (http://eventlet.net/doc/modules/timeout.html) as a distinct error condition within a transactional block, and invalidate the known connection(s) when this is caught.   This would insulate us from this particular issue regardless of driver, with the key assumption that it is in fact only a “timeout” condition under which this issue actually occurs.

More information about the OpenStack-dev mailing list