[openstack-dev] [all][oslo] Dealing with database connection sharing issues

Davanum Srinivas davanum at gmail.com
Fri Feb 20 15:24:01 UTC 2015


+1 to fix Oslo's service module any ways, irrespective of this bug.

+1 to "The db library shouldn't be concerned with whether or not it's
in a forked process -- that's not its job"

-- dims

On Fri, Feb 20, 2015 at 10:17 AM, Doug Hellmann <doug at doughellmann.com> wrote:
>
>
> On Thu, Feb 19, 2015, at 09:45 PM, Mike Bayer wrote:
>>
>>
>> Doug Hellmann <doug at doughellmann.com> wrote:
>>
>> >> 5) Allow this sort of connection sharing to continue for a deprecation
>> >> period with apppropriate logging, then make it a hard failure.
>> >>
>> >> This would provide services time to find and fix any sharing problems
>> >> they might have, but would delay the timeframe for a final fix.
>> >>
>> >> 6-ish) Fix oslo-incubator service.py to close all file descriptors after
>> >> forking.
>> >>
>> >
>> > I'm not sure why 6 is "slower", can someone elaborate on that?
>>
>> So, option “A”, they call engine.dispose() the moment they’re in a fork,
>> the activity upon requesting a connection from the pool is: look in pool,
>> no connections present, create a connection and return it.
>
> This feels like something we could do in the service manager base class,
> maybe by adding a "post fork" hook or something.
>
> Josh's patch to forcibly close all file descriptors may be something
> else we want, but if we can reset open connections "cleanly" when we
> know how, that feels better than relying on detecting broken sockets.
>
>>
>> Option “5”, the way the patch is right now to auto-invalidate on
>> detection of new fork, the activity upon requesting a connection is from
>> the pool is: look in pool, connection present, check that os.pid()
>> matches what we’ve associated with the connection record, if not, we
>> raise an exception indicating “invalid”, this is immediately caught, sets
>> the connection record as “invalid”, the connection record them
>> immediately disposes that file descriptor, makes a new connection and
>> returns that.
>>
>> Option “6”, the new fork starts, the activity upon requesting a
>> connection from the pool is: look in pool, connection present, perform
>> the oslo.db “ping” event, ping event emits “SELECT 1” to the MySQLdb
>> driver, driver attempts to emit this statement on the socket, socket
>> communication fails, MySQLdb converts to an exception, exception is
>> raised, SQLAlchemy catches the exception, sends it to a parser to
>> determine the nature of the exception, we see that it’s a “disconnect”
>> exception, we set the “invalidate” flag on the exception, we re-raise,
>> oslo.db’s exc_filters then catch the exception, more string parsers get
>> involved, we determine we need to raise an oslo.db.DBDisconnect
>> exception, we raise that, the “SELECT 1” ping handler catches that, we
>> then emit “SELECT 1” again so that it reconnects, we then hit the
>> connection record that’s in “invalid” state so it knows to reconnect, it
>> reconnects and the “SELECT 1” continues on the new connection and we
>> start up.
>>
>> So essentially option “5” (the way the gerrit is right now) has a subset
>> of the components of “6”; “6” has the additional steps of: emit a doomed
>> statement on the closed socket, then when it fails raise / catch / parse
>> / reraise / catch / parse / reraise that exception.   Option “5” just
>> has, check the pid, raise / catch an exception.
>>
>> IMO the two options are: “5”, check the pid and recover or “3” make it a
>> hard failure.
>
> And I don't think we want the database library doing anything with this
> case at all. Recovery code is tricky, and often prevents valid use cases
> (perhaps the parent *meant* for the child to reuse the open connection
> and isn't going to continue using it so there won't be a conflict).
>
> The bug here is in the way the application, using Oslo's service module,
> is forking. We should fix the service module to make it possible to fork
> correctly, and to have that be the default behavior. The db library
> shouldn't be concerned with whether or not it's in a forked process --
> that's not its job.
>
> Doug
>
>>
>> __________________________________________________________________________
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe:
>> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



-- 
Davanum Srinivas :: https://twitter.com/dims



More information about the OpenStack-dev mailing list