[openstack-dev] [all][oslo] Dealing with database connection sharing issues
Joshua Harlow
harlowja at outlook.com
Fri Feb 20 01:15:22 UTC 2015
Doug Hellmann wrote:
>
> On Thu, Feb 19, 2015, at 01:09 PM, Ben Nemec wrote:
>> Hi,
>>
>> Mike Bayer recently tracked down an issue with database errors in Cinder
>> to a single database connection being shared over multiple processes.
>> This is not something that should happen, and it turns out to cause
>> intermittent failures in the Cinder volume service. Full details can be
>> found in the bug here: https://bugs.launchpad.net/cinder/+bug/1417018
>> and his mailing list thread here:
>> http://lists.openstack.org/pipermail/openstack-dev/2015-February/057184.html
>>
>> The question we're facing is what to do about it. There's quite a lot
>> of discussion on https://review.openstack.org/#/c/156725 and in
>> http://eavesdrop.openstack.org/irclogs/%23openstack-oslo/%23openstack-oslo.2015-02-18.log
>> starting at 2015-02-18T21:38:12 but I'll try to summarize it here.
>>
>> On the plus side, we have a way to detect this sort of thing in oslo.db.
>> That's what Mike's change 156725 is about. On the minus side,
>> recovering from this in oslo.db is papering over a legitimate problem in
>> the calling service, and a lot of the discussion has been around how to
>> communicate that to the calling service. A few options that have been
>> mentioned:
>>
>> 1) Leave the linked change as-is, with a warning logged that will
>> hopefully be seen and prompt a fix in the service.
>>
>> The concerns raised with this is that the warning log level is a very
>> operator-visible thing and there's nothing an operator can do to fix
>> this other than pester the developers. Also, it seems developers tend
>> to ignore logs, so it's unlikely they'll pick up on it themselves.
>>
>> Note that while the errors resulting from this situation are
>> intermittent, the actual situation happens on every start up of
>> cinder-volume, so these messages will always be logged as it stands
>> today.
>>
>> 2) Change the log message to debug.
>>
>> This is the developer-focused log level, but as noted above developers
>> tend to ignore logs and it will be very easy for the message to get lost
>> in the debug noise. This option would likely require someone to go
>> specifically looking for the error to find it.
>>
>> 3) Make the error a hard failure.
>>
>> Rather than hide the error by recovering, fail immediately when it's
>> detected. This has the problem of making all the existing Cinder code
>> (and any other services with the same problem) in the wild incompatible
>> with any new releases of oslo.db, but it's about the only way to make
>> sure the error will be addressed now and in any future occurrences.
>>
>> 4) Leave the bug alone for now and just log a message so we can find out
>> how widespread this problem actually is.
>>
>> At the moment we only know it exists in Cinder, but due to the way the
>> service code works it's quite possible other projects have the same
>> problem and don't know it yet.
>>
>> 5) Allow this sort of connection sharing to continue for a deprecation
>> period with apppropriate logging, then make it a hard failure.
>>
>> This would provide services time to find and fix any sharing problems
>> they might have, but would delay the timeframe for a final fix.
>>
>> 6-ish) Fix oslo-incubator service.py to close all file descriptors after
>> forking.
>>
>> This is a best practice anyway so it's something we intend to pursue,
>> but it's probably more of a long-term fix because it will take some work
>> to implement and make sure it doesn't break existing services. It also
>> papers over the problem and according to Mike is basically a slower and
>> messier alternative to his current proposed change, so it's probably a
>> tangential change to avoid this in the future as opposed to a solution.
>>
>> If you've made it this far, thank you and please provide thoughts on the
>> options presented above. :-)
>
> I'm not sure why 6 is "slower", can someone elaborate on that?
Whether it's slower or not I put up:
https://review.openstack.org/#/c/157608
It's still not fully functional (something is not quite right with it
still...) but it will close any potentially left open file descriptors.
>
> Doug
>
>> -Ben
>>
>> __________________________________________________________________________
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe:
>> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
More information about the OpenStack-dev
mailing list