[openstack-dev] [Nova][Oslo-incubator] Automatic retry db.api query if database connection lost
Victor Sergeyev
vsergeyev at mirantis.com
Mon Jul 22 08:39:47 UTC 2013
Hi All.
There is a blueprint (
https://blueprints.launchpad.net/nova/+spec/db-reconnect) by Devananda van
der Veen, which goal is to implement reconnection to a database and
retrying of the last operation if a db connection fails. I’m working on the
implementation of this BP in oslo-incubator (
https://review.openstack.org/#/c/33831/).
Function _raise_if_db_connection_lost() was added to _wrap_db_error()
decorator defined in openstack/common/db/sqlalchemy/session.py. This
function catches sqlalchemy.exc.OperationalError and finds database error
code in this exception. If this error code is on `database has gone away`
error codes list, this function raises DBConnectionError exception.
Decorator for db.api methods was added to openstack/common/db/api.py.
We can apply this decorator to methods in db.sqlalchemy.api (not to
individual queries).
It catches DBConnectionError exception and retries the last query in a loop
until it succeeds, or until the timeout is reached. The timeout value is
configurable with min, max, and increment options.
We suppose that all db.api methods are executed inside a single
transaction, so retrying the whole method, when a connection is lost,
should be safe.
I would really like to receive some comments about the following
suggestions:
1. I can’t imagine a situation when we lose connection to an SQLite DB.
Also, as far as I know, SQLite is not used in production at the moment, so
we don't handle this case.
2. Please, leave some comments about `database has gone away` error codes
list in MySQL and PostgreSQL.
3. Johannes Erdfelt suggested that “retrying the whole method, even if it's
in a transaction, is only safe the entire method is idempotent. A method
could execute successfully in the database, but the connection could be
dropped before the final status is sent to the client.”
I agree, that this situation can cause data corruption in a database (e.
g., if we try to insert something to a database), but I’m not sure, how
RDBMS handle this. Also, I haven't succeeded in creation of a functional
test case, that would allow to reproduce the described situation easily.
Thanks, Victor
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20130722/1f92cf83/attachment.html>
More information about the OpenStack-dev
mailing list