[openstack-dev] [all] Replace mysql-python with mysqlclient

Mike Bayer mbayer at redhat.com
Mon May 11 14:44:58 UTC 2015



On 5/11/15 9:58 AM, Attila Fazekas wrote:
>
>
>
> ----- Original Message -----
>> From: "John Garbutt" <john at johngarbutt.com>
>> To: "OpenStack Development Mailing List (not for usage questions)" <openstack-dev at lists.openstack.org>
>> Cc: "Dan Smith" <dms at danplanet.com>
>> Sent: Saturday, May 9, 2015 12:45:26 PM
>> Subject: Re: [openstack-dev] [all] Replace mysql-python with mysqlclient
>>
>> On 30 April 2015 at 18:54, Mike Bayer <mbayer at redhat.com> wrote:
>>> On 4/30/15 11:16 AM, Dan Smith wrote:
>>>>> There is an open discussion to replace mysql-python with PyMySQL, but
>>>>> PyMySQL has worse performance:
>>>>>
>>>>> https://wiki.openstack.org/wiki/PyMySQL_evaluation
>>>> My major concern with not moving to something different (i.e. not based
>>>> on the C library) is the threading problem. Especially as we move in the
>>>> direction of cellsv2 in nova, not blocking the process while waiting for
>>>> a reply from mysql is going to be critical. Further, I think that we're
>>>> likely to get back a lot of performance from a supports-eventlet
>>>> database connection because of the parallelism that conductor currently
>>>> can only provide in exchange for the footprint of forking into lots of
>>>> workers.
>>>>
>>>> If we're going to move, shouldn't we be looking at something that
>>>> supports our threading model?
>>> yes, but at the same time, we should change our threading model at the
>>> level
>>> of where APIs are accessed to refer to a database, at the very least using
>>> a
>>> threadpool behind eventlet.   CRUD-oriented database access is faster using
>>> traditional threads, even in Python, than using an eventlet-like system or
>>> using explicit async.  The tests at
>>> http://techspot.zzzeek.org/2015/02/15/asynchronous-python-and-databases/
>>> show this.    With traditional threads, we can stay on the C-based MySQL
>>> APIs and take full advantage of their speed.
>> Sorry to go back in time, I wanted to go back to an important point.
>>
>> It seems we have three possible approaches:
>> * C lib and eventlet, blocks whole process
>> * pure python lib, and eventlet, eventlet does its thing
>> * go for a C lib and dispatch calls via thread pool
> * go with pure C protocol lib, which explicitly using `python patch-able`
>    I/O function (Maybe others like.: threading, mutex, sleep ..)
>
> * go with pure C protocol lib and the python part explicitly call
>    for `decode` and `encode`, the C part just do CPU intensive operations,
>    and it never calls for I/O primitives .
>
>> We have a few problems:
>> * performance sucks, we have to fork lots of nova-conductors and api nodes
>> * need to support python2.7 and 3.4, but its not currently possible
>> with the lib we use?
>> * want to pick a lib that we can fix when there are issues, and work to
>> improve
>>
>> It sounds like:
>> * currently do the first one, it sucks, forking nova-conductor helps
>> * seems we are thinking the second one might work, we sure get py3.4 +
>> py2.7 support
>> * the last will mean more work, but its likely to be more performant
>> * worried we are picking a unsupported lib with little future
>>
>> I am leaning towards us moving to making DB calls with a thread pool
>> and some fast C based library, so we get the 'best' performance.
>>
>> Is that a crazy thing to be thinking? What am I missing here?
> Using the python socket from C code:
> https://github.com/esnme/ultramysql/blob/master/python/io_cpython.c#L100
>
> Also possible to implement a mysql driver just as a protocol parser,
> and you are free to use you favorite event based I/O strategy (direct epoll usage)
> even without eventlet (or similar).
>
> The issue with ultramysql, it does not implements
> the `standard` python DB API, so you would need to add an extra wrapper to SQLAlchemy.

This driver appears to have seen its last commit about a year ago, that 
doesn't even implement the standard DBAPI (which is already a red 
flag).   There is apparently a separately released (!) DBAPI-compat 
wrapper https://pypi.python.org/pypi/umysqldb/1.0.3 which has had no 
releases in two years.     If this wrapper is indeed compatible with 
MySQLdb then it would run in SQLAlchemy without changes (though I'd be 
extremely surprised if it passes our test suite).

How would using these obscure libraries be any preferable than running 
Nova API functions within the thread-pooling facilities already included 
with eventlet ?        Keeping in mind that I've now done the work [1] 
to show that there is no performance gain to be had for all the trouble 
we go through to use eventlet/gevent/asyncio with local database 
connections.

[1] http://techspot.zzzeek.org/2015/02/15/asynchronous-python-and-databases/









More information about the OpenStack-dev mailing list