[openstack-dev] [all][oslo] Dealing with database connection sharing issues

Joshua Harlow harlowja at outlook.com
Fri Feb 20 18:12:57 UTC 2015


Doug Hellmann wrote:
>
> On Thu, Feb 19, 2015, at 09:45 PM, Mike Bayer wrote:
>>
>> Doug Hellmann<doug at doughellmann.com>  wrote:
>>
>>>> 5) Allow this sort of connection sharing to continue for a deprecation
>>>> period with apppropriate logging, then make it a hard failure.
>>>>
>>>> This would provide services time to find and fix any sharing problems
>>>> they might have, but would delay the timeframe for a final fix.
>>>>
>>>> 6-ish) Fix oslo-incubator service.py to close all file descriptors after
>>>> forking.
>>>>
>>> I'm not sure why 6 is "slower", can someone elaborate on that?
>> So, option “A”, they call engine.dispose() the moment they’re in a fork,
>> the activity upon requesting a connection from the pool is: look in pool,
>> no connections present, create a connection and return it.
>
> This feels like something we could do in the service manager base class,
> maybe by adding a "post fork" hook or something.

+1 to that.

I think it'd be nice to have the service __init__() maybe be something like:

  def __init__(self, threads=1000, prefork_callbacks=None,  	
               postfork_callbacks=None):
     self.postfork_callbacks = postfork_callbacks or []
     self.prefork_callbacks = prefork_callbacks or []
     # always ensure we are closing any left-open fds last...
     self.prefork_callbacks.append(self._close_descriptors)
     ...

>
> Josh's patch to forcibly close all file descriptors may be something
> else we want, but if we can reset open connections "cleanly" when we
> know how, that feels better than relying on detecting broken sockets.
>
>> Option “5”, the way the patch is right now to auto-invalidate on
>> detection of new fork, the activity upon requesting a connection is from
>> the pool is: look in pool, connection present, check that os.pid()
>> matches what we’ve associated with the connection record, if not, we
>> raise an exception indicating “invalid”, this is immediately caught, sets
>> the connection record as “invalid”, the connection record them
>> immediately disposes that file descriptor, makes a new connection and
>> returns that.
>>
>> Option “6”, the new fork starts, the activity upon requesting a
>> connection from the pool is: look in pool, connection present, perform
>> the oslo.db “ping” event, ping event emits “SELECT 1” to the MySQLdb
>> driver, driver attempts to emit this statement on the socket, socket
>> communication fails, MySQLdb converts to an exception, exception is
>> raised, SQLAlchemy catches the exception, sends it to a parser to
>> determine the nature of the exception, we see that it’s a “disconnect”
>> exception, we set the “invalidate” flag on the exception, we re-raise,
>> oslo.db’s exc_filters then catch the exception, more string parsers get
>> involved, we determine we need to raise an oslo.db.DBDisconnect
>> exception, we raise that, the “SELECT 1” ping handler catches that, we
>> then emit “SELECT 1” again so that it reconnects, we then hit the
>> connection record that’s in “invalid” state so it knows to reconnect, it
>> reconnects and the “SELECT 1” continues on the new connection and we
>> start up.
>>
>> So essentially option “5” (the way the gerrit is right now) has a subset
>> of the components of “6”; “6” has the additional steps of: emit a doomed
>> statement on the closed socket, then when it fails raise / catch / parse
>> / reraise / catch / parse / reraise that exception.   Option “5” just
>> has, check the pid, raise / catch an exception.
>>
>> IMO the two options are: “5”, check the pid and recover or “3” make it a
>> hard failure.
>
> And I don't think we want the database library doing anything with this
> case at all. Recovery code is tricky, and often prevents valid use cases
> (perhaps the parent *meant* for the child to reuse the open connection
> and isn't going to continue using it so there won't be a conflict).
>
> The bug here is in the way the application, using Oslo's service module,
> is forking. We should fix the service module to make it possible to fork
> correctly, and to have that be the default behavior. The db library
> shouldn't be concerned with whether or not it's in a forked process --
> that's not its job.
>
> Doug
>
>>
>> __________________________________________________________________________
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe:
>> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



More information about the OpenStack-dev mailing list