[Openstack] [nova] mysql connection optimization

Devananda van der Veen devananda.vdv at gmail.com
Wed Jul 11 22:19:32 UTC 2012


I believe sqlalchemy already tracks the session state, though I haven't
seen anything (in the docs or code that I've touched) suggesting it knows
how to be selective in issuing ROLLBACK. It would be a good feature, though
I still think the sql_idle_timeout should be lowered ;)

-Deva


On Wed, Jul 11, 2012 at 1:43 PM, Mark Gius <mark at markgius.com> wrote:

> I would also love to see these changes applied.
>
> With regards to the bugs around not issuing a commit or rollback, is it
> possible to have sqlachemy track whether or not a transaction starts and
> only issue a rollback when a session is handed back with an open
> transaction on it?  Seems like a useful defensive measure.
>
> Mark
>
> On Wed, Jul 11, 2012 at 12:46 PM, Jay Pipes <jaypipes at gmail.com> wrote:
>
>> +1 to all your ideas below, Devananda.
>>
>> On 07/11/2012 01:33 PM, Devananda van der Veen wrote:
>> > Hi all,
>> >
>> > I've been taking a look at the way Nova uses its MySQL database. Having
>> > done MySQL performance audits for years as a consultant, a few things
>> > jumped out right away at me. First is the way that SQLAlchemy is
>> > wrapping nearly every query in an unnecessary "ping check" and rollback,
>> > eg.:
>> >
>> >     select 1;
>> >     select ... from ... where ...;
>> >     rollback;
>> >     select 1;
>> >     update ... where ...;
>> >     commit;
>> >     rollback;
>> >
>> >
>> > You can find a complete sample here:
>> http://paste.openstack.org/show/18731/
>> >
>> > I think I understand the reason for both the "select 1" and the
>> > "rollback" statements. However, in the interest of performance for
>> > large-scale deployments, I feel pretty strongly that they need to go
>> away.
>> >
>> > As I see it, there are three factors here.
>> >
>> > (I) Most of the code in db/sqlalchemy/api.py abstracts a "unit of work"
>> > to a very low level, generally a single database read or write, and does
>> > not currently support transactions spanning multiple consistent writes.
>> > This is why "select 1" and "rollback" appear around almost every query
>> > -- most functions in api.py is checking out a session object, doing one
>> > or two queries, and then releasing the session. This actually creates a
>> > much larger issue about transactional atomicity for larger operations,
>> > such as the open bug about network creation here, and is probably better
>> > for another discussion.
>> > https://bugs.launchpad.net/nova/+bug/755138.
>> >
>> > (II) Connections are tested by the MySQLPingListener function, using
>> > "SELECT 1" statements, every time a connection / session is checked out
>> > of the pool. Since connections are usually OK, this adds overhead
>> > unnecessarily. It would be more efficient to handle the errors when
>> > connections aren't OK. I've opened a bug with a description of one
>> > possible way to fix this issue. There are probably other viable
>> > solutions as well.
>> > https://bugs.launchpad.net/nova/+bug/1007027
>> >
>> > My understanding is that this was implemented to prevent "Database has
>> > gone away" errors from occurring every time that the database is
>> > restarted, or when connections are closed by the database after being
>> > idle for long periods. In my opinion, a better solution to these
>> > problems is to:
>> > * wrap queries in retry logic, which will catch disconnect errors and
>> > attempt to reconnect to the database. I have a patch for this, if folks
>> > are interested.
>> > * set Nova's sql_idle_timeout to less than MySQL's wait_timeout, and set
>> > them both to a reasonably short interval (eg, 1 minute), rather than the
>> > default (which I think is 8 hours).
>> >
>> > (III) Transaction state is reset when connections are returned to the
>> > pool, even if no transaction was in progress, or the
>> > transaction-in-progress already committed. This is completely wasteful,
>> > and easy to disable.
>> > https://bugs.launchpad.net/nova/+bug/1007038
>> >
>> > Caveat here is that this reset-on-return functionality *is* useful when
>> > code doesn't properly clean up its own transaction state. When I turned
>> > it off, it exposed a few bugs, which I haven't tracked down yet.
>> > Lowering the sql_idle_timeout will provide the same "solution" as the
>> > current reset-on-return behavior in that it will prevent long-running
>> > idle transactions that tie up resources unnecessarily.
>> >
>> >
>> > In summary, I'd like to see Nova stop spamming the database with "SELECT
>> > 1" and  "ROLLBACK", and think this should be pretty easy to do. Testing
>> > these two changes in my devstack seems to work fine, but I'd like to
>> > know what others think before I propose a changeset with this kind of
>> > potential impact.
>> >
>> > Cheers,
>> > Devananda
>> >
>> >
>> > _______________________________________________
>> > Mailing list: https://launchpad.net/~openstack
>> > Post to     : openstack at lists.launchpad.net
>> > Unsubscribe : https://launchpad.net/~openstack
>> > More help   : https://help.launchpad.net/ListHelp
>> >
>>
>>
>> _______________________________________________
>> Mailing list: https://launchpad.net/~openstack
>> Post to     : openstack at lists.launchpad.net
>> Unsubscribe : https://launchpad.net/~openstack
>> More help   : https://help.launchpad.net/ListHelp
>>
>
>
> _______________________________________________
> Mailing list: https://launchpad.net/~openstack
> Post to     : openstack at lists.launchpad.net
> Unsubscribe : https://launchpad.net/~openstack
> More help   : https://help.launchpad.net/ListHelp
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack/attachments/20120711/147f42a0/attachment.html>


More information about the Openstack mailing list