Open Stack

Mon May 11 22:44:30 UTC 2015

On 5/11/15 5:25 PM, Robert Collins wrote:
>
> Details: Skip over this bit if you know it all already.
>
> The GIL plays a big factor here: if you want to scale the amount of
> CPU available to a Python service, you have two routes:
> A) move work to a different process through some RPC - be that DB's
> using SQL, other services using oslo.messaging or HTTP - whatever.
> B) use C extensions to perform work in threads - e.g. openssl context
> processing.
>
> To increase concurrency you can use threads, eventlet, asyncio,
> twisted etc - because within a single process *all* Python bytecode
> execution happens inside the GIL lock, so you get at most one CPU for
> a CPU bound workload. For an IO bound workload, you can fit more work
> in by context switching within that one CPU capacity. And - the GIL is
> a poor scheduler, so at the limit - an IO bound workload where the IO
> backend has more capacity than we have CPU to consume it within our
> process, you will run into priority inversion and other problems.
> [This varies by Python release too].
>
> request_duration = time_in_cpu + time_blocked
> request_cpu_utilisation = time_in_cpu/request_duration
> cpu_utilisation = concurrency * request_cpu_utilisation
>
> Assuming that we don't want any one process to spend a lot of time at
> 100% - to avoid such at-the-limit issues, lets pick say 80%
> utilisation, or a safety factor of 0.2. If a single request consumes
> 50% of its duration waiting on IO, and 50% of its duration executing
> bytecode, we can only run one such request concurrently without
> hitting 100% utilisations. (2*0.5 CPU == 1). For a request that spends
> 75% of its duration waiting on IO and 25% on CPU, we can run 3 such
> requests concurrently without exceeding our target of 80% utilisation:
> (3*0.25=0.75).
>
> What we have today in our standard architecture for OpenStack is
> optimised for IO bound workloads: waiting on the
> network/subprocesses/disk/libvirt etc. Running high numbers of
> eventlet handlers in a single process only works when the majority of
> the work being done by a handler is IO.

Everything stated here is great, however in our situation there is one 
unfortunate fact which renders it completely incorrect at the moment.   
I'm still puzzled why we are getting into deep think sessions about the 
vagaries of the GIL and async when there is essentially a full-on 
red-alert performance blocker rendering all of this discussion useless, 
so I must again remind us: what we have *today* in Openstack is *as 
completely un-optimized as you can possibly be*.

The most GIL-heavy nightmare CPU bound task you can imagine running on 
25 threads on a ten year old Pentium will run better than the Openstack 
we have today, because we are running a C-based, non-eventlet patched DB 
library within a single OS thread that happens to use eventlet, but the 
use of eventlet is totally pointless because right now it blocks 
completely on all database IO.   All production Openstack applications 
today are fully serialized to only be able to emit a single query to the 
database at a time; for each message sent, the entire application blocks 
an order of magnitude more than it would under the GIL waiting for the 
database library to send a message to MySQL, waiting for MySQL to send a 
response including the full results, waiting for the database to unwrap 
the response into Python structures, and finally back to the Python 
space, where we can send another database message and block the entire 
application and all greenlets while this single message proceeds.

To share a link I've already shared about a dozen times here, here's 
some tests under similar conditions which illustrate what that 
concurrency looks like: 
http://www.diamondtin.com/2014/sqlalchemy-gevent-mysql-python-drivers-comparison/. 
MySQLdb takes *20 times longer* to handle the work of 100 sessions than 
PyMySQL when it's inappropriately run under gevent, when there is 
modestly high concurrency happening.   When I talk about moving to 
threads, this is not a "won't help or hurt" kind of issue, at the moment 
it's a change that will immediately allow massive improvement to the 
performance of all Openstack applications instantly.  We need to change 
the DB library or dump eventlet.

As far as if we should dump eventlet or use a pure-Python DB library, my 
contention is that a thread based + C database library will outperform 
an eventlet + Python-based database library. Additionally, if we make 
either change, when we do so we may very well see all kinds of new 
database-concurrency related bugs in our apps too, because we will be 
talking to the database much more intensively all the sudden; it is my 
opinion that a traditional threading model will be an easier environment 
to handle working out the approach to these issues; we have to assume 
"concurrency at any time" in any case because we run multiple instances 
of Nova etc. at the same time.  At the end of the day, we aren't going 
to see wildly better performance with one approach over the other in any 
case, so we should pick the one that is easier to develop, maintain, and 
keep stable.

Robert's analysis talks about various "at the limit" issues,  but I was 
unable to reproduce these in my own testing, and we should be relying 
upon working tests to illustrate what performance characteristics 
actually pan out.    My tests only dealt with psycopg2 and Postgresql 
for example; won't someone work with my tests and try to replicate with 
PyMySQL/eventlet vs. MySQL-Python/threads?   We should be relying on 
testing to see what reality actually holds here.   But more than that, 
first we need to fix the obviously broken thing about our DB access 
before we can claim anything is optimized at all, and after we do that, 
I don't think splitting hairs into threads vs. eventlet is really going 
to make that much of a difference performance-wise.  We should go with 
what produces the most stable development and usage experience while 
allowing a high degree of concurrency.

Open Stack

[openstack-dev] [all] Replace mysql-python with mysqlclient

OpenStack

Community

Documentation

Branding & Legal