[openstack-dev] [all] Replace mysql-python with mysqlclient

Attila Fazekas afazekas at redhat.com
Mon May 11 18:02:53 UTC 2015





----- Original Message -----
> From: "Mike Bayer" <mbayer at redhat.com>
> To: openstack-dev at lists.openstack.org
> Sent: Monday, May 11, 2015 4:44:58 PM
> Subject: Re: [openstack-dev] [all] Replace mysql-python with mysqlclient
> 
> 
> 
> On 5/11/15 9:58 AM, Attila Fazekas wrote:
> >
> >
> >
> > ----- Original Message -----
> >> From: "John Garbutt" <john at johngarbutt.com>
> >> To: "OpenStack Development Mailing List (not for usage questions)"
> >> <openstack-dev at lists.openstack.org>
> >> Cc: "Dan Smith" <dms at danplanet.com>
> >> Sent: Saturday, May 9, 2015 12:45:26 PM
> >> Subject: Re: [openstack-dev] [all] Replace mysql-python with mysqlclient
> >>
> >> On 30 April 2015 at 18:54, Mike Bayer <mbayer at redhat.com> wrote:
> >>> On 4/30/15 11:16 AM, Dan Smith wrote:
> >>>>> There is an open discussion to replace mysql-python with PyMySQL, but
> >>>>> PyMySQL has worse performance:
> >>>>>
> >>>>> https://wiki.openstack.org/wiki/PyMySQL_evaluation
> >>>> My major concern with not moving to something different (i.e. not based
> >>>> on the C library) is the threading problem. Especially as we move in the
> >>>> direction of cellsv2 in nova, not blocking the process while waiting for
> >>>> a reply from mysql is going to be critical. Further, I think that we're
> >>>> likely to get back a lot of performance from a supports-eventlet
> >>>> database connection because of the parallelism that conductor currently
> >>>> can only provide in exchange for the footprint of forking into lots of
> >>>> workers.
> >>>>
> >>>> If we're going to move, shouldn't we be looking at something that
> >>>> supports our threading model?
> >>> yes, but at the same time, we should change our threading model at the
> >>> level
> >>> of where APIs are accessed to refer to a database, at the very least
> >>> using
> >>> a
> >>> threadpool behind eventlet.   CRUD-oriented database access is faster
> >>> using
> >>> traditional threads, even in Python, than using an eventlet-like system
> >>> or
> >>> using explicit async.  The tests at
> >>> http://techspot.zzzeek.org/2015/02/15/asynchronous-python-and-databases/
> >>> show this.    With traditional threads, we can stay on the C-based MySQL
> >>> APIs and take full advantage of their speed.
> >> Sorry to go back in time, I wanted to go back to an important point.
> >>
> >> It seems we have three possible approaches:
> >> * C lib and eventlet, blocks whole process
> >> * pure python lib, and eventlet, eventlet does its thing
> >> * go for a C lib and dispatch calls via thread pool
> > * go with pure C protocol lib, which explicitly using `python patch-able`
> >    I/O function (Maybe others like.: threading, mutex, sleep ..)
> >
> > * go with pure C protocol lib and the python part explicitly call
> >    for `decode` and `encode`, the C part just do CPU intensive operations,
> >    and it never calls for I/O primitives .
> >
> >> We have a few problems:
> >> * performance sucks, we have to fork lots of nova-conductors and api nodes
> >> * need to support python2.7 and 3.4, but its not currently possible
> >> with the lib we use?
> >> * want to pick a lib that we can fix when there are issues, and work to
> >> improve
> >>
> >> It sounds like:
> >> * currently do the first one, it sucks, forking nova-conductor helps
> >> * seems we are thinking the second one might work, we sure get py3.4 +
> >> py2.7 support
> >> * the last will mean more work, but its likely to be more performant
> >> * worried we are picking a unsupported lib with little future
> >>
> >> I am leaning towards us moving to making DB calls with a thread pool
> >> and some fast C based library, so we get the 'best' performance.
> >>
> >> Is that a crazy thing to be thinking? What am I missing here?
> > Using the python socket from C code:
> > https://github.com/esnme/ultramysql/blob/master/python/io_cpython.c#L100
> >
> > Also possible to implement a mysql driver just as a protocol parser,
> > and you are free to use you favorite event based I/O strategy (direct epoll
> > usage)
> > even without eventlet (or similar).
> >
> > The issue with ultramysql, it does not implements
> > the `standard` python DB API, so you would need to add an extra wrapper to
> > SQLAlchemy.
> 
> This driver appears to have seen its last commit about a year ago, that
> doesn't even implement the standard DBAPI (which is already a red
> flag).   There is apparently a separately released (!) DBAPI-compat
> wrapper https://pypi.python.org/pypi/umysqldb/1.0.3 which has had no
> releases in two years.     If this wrapper is indeed compatible with
> MySQLdb then it would run in SQLAlchemy without changes (though I'd be
> extremely surprised if it passes our test suite).
> 
> How would using these obscure libraries be any preferable than running
> Nova API functions within the thread-pooling facilities already included
> with eventlet ?        Keeping in mind that I've now done the work [1]
> to show that there is no performance gain to be had for all the trouble
> we go through to use eventlet/gevent/asyncio with local database
> connections.

Not just with local database connections,
the 10G network itself also fast. Is is possible you spend more time even on
the kernel side tcp/ip stack (and the context switch..) (Not in physical I/O wait)
than in the actual work on the DB side. (Check netperf TCP_RR)

The scary part of a blocking I/O call is when you have two
python thread (or green thread) and one of them is holding a DB lock the other 
is waiting for the same lock in a native blocking I/O syscall.
If you do a read(2) in native code, the python itself might not be able to preempt it.
Your transaction might be finished with `DB Lock wait timeout`, 
with 30 sec of doing nothing, instead of scheduling to the another python thread,
which would be able to release the lock.

> 
> [1] http://techspot.zzzeek.org/2015/02/15/asynchronous-python-and-databases/
> 
> 
> 
> 
> 
> 
> 
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 



More information about the OpenStack-dev mailing list