[openstack-dev] [all][oslo] Dealing with database connection sharing issues

Mike Bayer mbayer at redhat.com
Fri Feb 20 02:45:41 UTC 2015



Doug Hellmann <doug at doughellmann.com> wrote:

>> 5) Allow this sort of connection sharing to continue for a deprecation
>> period with apppropriate logging, then make it a hard failure.
>> 
>> This would provide services time to find and fix any sharing problems
>> they might have, but would delay the timeframe for a final fix.
>> 
>> 6-ish) Fix oslo-incubator service.py to close all file descriptors after
>> forking.
>> 
> 
> I'm not sure why 6 is "slower", can someone elaborate on that?

So, option “A”, they call engine.dispose() the moment they’re in a fork, the activity upon requesting a connection from the pool is: look in pool, no connections present, create a connection and return it.

Option “5”, the way the patch is right now to auto-invalidate on detection of new fork, the activity upon requesting a connection is from the pool is: look in pool, connection present, check that os.pid() matches what we’ve associated with the connection record, if not, we raise an exception indicating “invalid”, this is immediately caught, sets the connection record as “invalid”, the connection record them immediately disposes that file descriptor, makes a new connection and returns that.

Option “6”, the new fork starts, the activity upon requesting a connection from the pool is: look in pool, connection present, perform the oslo.db “ping” event, ping event emits “SELECT 1” to the MySQLdb driver, driver attempts to emit this statement on the socket, socket communication fails, MySQLdb converts to an exception, exception is raised, SQLAlchemy catches the exception, sends it to a parser to determine the nature of the exception, we see that it’s a “disconnect” exception, we set the “invalidate” flag on the exception, we re-raise, oslo.db’s exc_filters then catch the exception, more string parsers get involved, we determine we need to raise an oslo.db.DBDisconnect exception, we raise that, the “SELECT 1” ping handler catches that, we then emit “SELECT 1” again so that it reconnects, we then hit the connection record that’s in “invalid” state so it knows to reconnect, it reconnects and the “SELECT 1” continues on the new connection and we start up.

So essentially option “5” (the way the gerrit is right now) has a subset of the components of “6”; “6” has the additional steps of: emit a doomed statement on the closed socket, then when it fails raise / catch / parse / reraise / catch / parse / reraise that exception.   Option “5” just has, check the pid, raise / catch an exception.

IMO the two options are: “5”, check the pid and recover or “3” make it a hard failure.
 


More information about the OpenStack-dev mailing list