[openstack-dev] [Nova][Oslo-incubator] Automatic retry db.api query if database connection lost

Victor Sergeyev vsergeyev at mirantis.com
Mon Jul 22 08:39:47 UTC 2013


Hi All.

There is a blueprint (
https://blueprints.launchpad.net/nova/+spec/db-reconnect) by Devananda van
der Veen, which goal is to implement reconnection to a database and
retrying of the last operation if a db connection fails. I’m working on the
implementation of this BP in oslo-incubator (
https://review.openstack.org/#/c/33831/).

Function _raise_if_db_connection_lost() was added to _wrap_db_error()
decorator defined in openstack/common/db/sqlalchemy/session.py. This
function catches sqlalchemy.exc.OperationalError and finds database error
code in this exception. If this error code is on `database has gone away`
error codes list, this function raises DBConnectionError exception.

Decorator for db.api methods was added to openstack/common/db/api.py.
We can apply this decorator to methods in db.sqlalchemy.api (not to
individual queries).
It catches DBConnectionError exception and retries the last query in a loop
until it succeeds, or until the timeout is reached. The timeout value is
configurable with min, max, and increment options.
We suppose that all db.api methods are executed inside a single
transaction, so retrying the whole method, when a connection is lost,
should be safe.

I would really like to receive some comments about the following
suggestions:

1. I can’t imagine a situation when we lose connection to an SQLite DB.
Also, as far as I know, SQLite is not used in production at the moment, so
we don't handle this case.

2. Please, leave some comments about  `database has gone away` error codes
list in MySQL and PostgreSQL.

3. Johannes Erdfelt suggested that “retrying the whole method, even if it's
in a transaction, is only safe the entire method is idempotent. A method
could execute successfully in the database, but the connection could be
dropped before the final status is sent to the client.”
I agree, that this situation can cause data corruption in a database (e.
g., if we try to insert something to a database), but I’m not sure, how
RDBMS handle this. Also, I haven't succeeded in creation of a functional
test case, that would allow to reproduce the described situation easily.


Thanks, Victor
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20130722/1f92cf83/attachment.html>


More information about the OpenStack-dev mailing list