Open Stack

Mon Sep 26 02:44:17 UTC 2011

Hey Monty/All,

The original goal of my eventlet connection pooling patch was to increase overall throughput of the OpenStack API. By itself, SQLAlchemy provides a lot of nifty features such as connection pooling and connection timeout limits, but all of these were being lost on us because of our use of eventlet in the API.

To elaborate, when multiples users query the API simultaneously, what we'd expect to happen is for a greenthread to be created for each request. This happens as expected, however since SQLAlchemy uses the MySQLdb module by default all database calls block *all* greenthreads. This is because MySQLdb is written in C and thus can't be monkey patched by eventlet.

As a result, the API basically does all SQL queries in serial and we're obviously (?) going to have to support multiple concurrent connections in the API. It was very evident in load testing of the API that something needed to change.

The patch introduced db_pool (http://eventlet.net/doc/modules/db_pool.html), which more or less uses threads to overcome the limitation of using MySQLdb in conjunction with eventlet.

Long story shot, my patch ended up creating a lot of issues and I absolutely agree that something needs to change ASAP.

Monty, I can try to answer your questions/concerns:

> Running the script a single time caused 16 connection threads to be
> spawned. This seems a bit excessive.

There are two flags that define how many connections should be maintained per service. These flags are 'sql_min_pool_size' and 'sql_max_pool_size'. Unfortunately I set both of these to 10 by default. When I ran the test script provided by Vish I saw 15 connections created. What seems to be happening is that initially 10 connections are pooled and 5 additional connections are being created (one per greenlet it's spawning). Long story short here is that something is wrong because at most you should be seeing 10 connections, so that's one outstanding issue.

> When I tried spawning five copies at the same time, I wound up with 60
> connected threads, plus some connect timeouts, plus some of the
> tracebacks you're seeing below.

Absolutely. With the current defaults you'll hit limits quickly.

> More vexing is that all of these queries were doing select ... for
> update (makes sense why) - but that just means that we're stacking up on
> locks in the db trying to get these resources... one of those situations
> where greater parallelism actually isn't better.

Absolutely, a fix for this could be to place a nova.utils 'synchronized' lock on methods like these to make sure that they're only run in parallel. While this might be "vexing", that's exactly what this script was designed to show...a worst case (SELECT .. FOR UPDATE).

> Without really knowing more, my vote would be not to have app-level
> connection pooling by default. MySQL specifically doesn't really like
> having a bunch of unused connection sitting around for long lifecycles
> (with a few exceptions - the exceptions always prove the rule, of course)

I'm not a MySQL guru by any means, but can you explain this to me? I've never read anywhere that MySQL "doesn't really like having a bunch of unused connection sitting around for long lifecycles". It seems pretty logical to me to have at least 2 persistent connections to the database to avoid being completely blocked on database calls. 

Brian

-----Original Message-----
From: "Monty Taylor" <mordred at inaugust.com>
Sent: Sunday, September 25, 2011 8:48pm
To: openstack at lists.launchpad.net
Subject: Re: [Openstack] Nova DB Connection Pooling

What was the intent of the connection pooling? That is, what was it
trying to fix?

Running the script a single time caused 16 connection threads to be
spawned. This seems a bit excessive.

When I tried spawning five copies at the same time, I wound up with 60
connected threads, plus some connect timeouts, plus some of the
tracebacks you're seeing below.

Increasing the thread_cache_size from the default on ubuntu (which is 8)
helped things, but things still seemed to be going to the bad place.

More vexing is that all of these queries were doing select ... for
update (makes sense why) - but that just means that we're stacking up on
locks in the db trying to get these resources... one of those situations
where greater parallelism actually isn't better.

Without really knowing more, my vote would be not to have app-level
connection pooling by default. MySQL specifically doesn't really like
having a bunch of unused connection sitting around for long lifecycles
(with a few exceptions - the exceptions always prove the rule, of course)

Of course- I could be wrong... which is why I'd like to know more about
what the issue was that incited connection pooling.

Monty

On 09/25/2011 01:53 PM, Vishvananda Ishaya wrote:
> Hey everyone,
> 
> I'm a bit concerned with the connection pooling in the db.  It seems that things are not getting cleaned up properly.  I have a repro-case that causes failures that we have seen before.  if I revert the nova/db/sqlalchemy/session.py to before the eventlet db pool was added I get no failures.  If you want to see the issue, try the attached code.  You will need to run from the nova directory or do python setup.py develop.  You will also need to create a mysql database called test and edit the sql_connection string if you have a mysql password, etc.  Please check this code.  If we can't come up with a fix, I htink we need to revert back to no connection pooling.
> 
> Run the attached script at least 3 times The code below runs fine the first couple of times, Then it starts to fail with the following error:
> 
> 2011-09-24 12:51:02,799 INFO sqlalchemy.engine.base.Engine.0x...36d0 [-] ROLLBACK
> Traceback (most recent call last):
>  File "/Library/Python/2.7/site-packages/eventlet/hubs/hub.py", line 336, in fire_timers
>    timer()
>  File "/Library/Python/2.7/site-packages/eventlet/hubs/timer.py", line 56, in __call__
>    cb(*args, **kw)
>  File "/Library/Python/2.7/site-packages/eventlet/event.py", line 163, in _do_send
>    waiter.switch(result)
>  File "/Library/Python/2.7/site-packages/eventlet/greenthread.py", line 192, in main
>    result = function(*args, **kwargs)
>  File "dbrepro.py", line 44, in associate
>    ip = db.fixed_ip_associate_pool(ctxt, 1, instance_id=val)
>  File "/Users/vishvananda/os/nova/nova/db/api.py", line 352, in fixed_ip_associate_pool
>    instance_id, host)
>  File "/Users/vishvananda/os/nova/nova/db/sqlalchemy/api.py", line 102, in wrapper
>    return f(*args, **kwargs)
>  File "/Users/vishvananda/os/nova/nova/db/sqlalchemy/api.py", line 725, in fixed_ip_associate_pool
>    filter_by(host=None).\
>  File "/Library/Python/2.7/site-packages/sqlalchemy/orm/query.py", line 1496, in first
>    ret = list(self[0:1])
>  File "/Library/Python/2.7/site-packages/sqlalchemy/orm/query.py", line 1405, in __getitem__
>    return list(res)
>  File "/Library/Python/2.7/site-packages/sqlalchemy/orm/query.py", line 1669, in instances
>    fetch = cursor.fetchall()
>  File "/Library/Python/2.7/site-packages/sqlalchemy/engine/base.py", line 2383, in fetchall
>    l = self.process_rows(self._fetchall_impl())
>  File "/Library/Python/2.7/site-packages/sqlalchemy/engine/base.py", line 2366, in process_rows
>    keymap = metadata._keymap
> AttributeError: 'NoneType' object has no attribute '_keymap'
> 
> 
> 
> 
> 
> 
> 
> 
> _______________________________________________
> Mailing list: https://launchpad.net/~openstack
> Post to     : openstack at lists.launchpad.net
> Unsubscribe : https://launchpad.net/~openstack
> More help   : https://help.launchpad.net/ListHelp

_______________________________________________
Mailing list: https://launchpad.net/~openstack
Post to     : openstack at lists.launchpad.net
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp
This email may include confidential information. If you received it in error, please delete it.

Open Stack

[Openstack] Nova DB Connection Pooling

OpenStack

Community

Documentation

Branding & Legal