[openstack-dev] [all][oslo] Dealing with database connection sharing issues

Doug Hellmann doug at doughellmann.com
Thu Feb 19 20:44:56 UTC 2015



On Thu, Feb 19, 2015, at 01:09 PM, Ben Nemec wrote:
> Hi,
> 
> Mike Bayer recently tracked down an issue with database errors in Cinder
> to a single database connection being shared over multiple processes.
> This is not something that should happen, and it turns out to cause
> intermittent failures in the Cinder volume service.  Full details can be
> found in the bug here: https://bugs.launchpad.net/cinder/+bug/1417018
> and his mailing list thread here:
> http://lists.openstack.org/pipermail/openstack-dev/2015-February/057184.html
> 
> The question we're facing is what to do about it.  There's quite a lot
> of discussion on https://review.openstack.org/#/c/156725 and in
> http://eavesdrop.openstack.org/irclogs/%23openstack-oslo/%23openstack-oslo.2015-02-18.log
> starting at 2015-02-18T21:38:12  but I'll try to summarize it here.
> 
> On the plus side, we have a way to detect this sort of thing in oslo.db.
> That's what Mike's change 156725 is about.  On the minus side,
> recovering from this in oslo.db is papering over a legitimate problem in
> the calling service, and a lot of the discussion has been around how to
> communicate that to the calling service.  A few options that have been
> mentioned:
> 
> 1) Leave the linked change as-is, with a warning logged that will
> hopefully be seen and prompt a fix in the service.
> 
> The concerns raised with this is that the warning log level is a very
> operator-visible thing and there's nothing an operator can do to fix
> this other than pester the developers.  Also, it seems developers tend
> to ignore logs, so it's unlikely they'll pick up on it themselves.
> 
> Note that while the errors resulting from this situation are
> intermittent, the actual situation happens on every start up of
> cinder-volume, so these messages will always be logged as it stands
> today.
> 
> 2) Change the log message to debug.
> 
> This is the developer-focused log level, but as noted above developers
> tend to ignore logs and it will be very easy for the message to get lost
> in the debug noise.  This option would likely require someone to go
> specifically looking for the error to find it.
> 
> 3) Make the error a hard failure.
> 
> Rather than hide the error by recovering, fail immediately when it's
> detected.  This has the problem of making all the existing Cinder code
> (and any other services with the same problem) in the wild incompatible
> with any new releases of oslo.db, but it's about the only way to make
> sure the error will be addressed now and in any future occurrences.
> 
> 4) Leave the bug alone for now and just log a message so we can find out
> how widespread this problem actually is.
> 
> At the moment we only know it exists in Cinder, but due to the way the
> service code works it's quite possible other projects have the same
> problem and don't know it yet.
> 
> 5) Allow this sort of connection sharing to continue for a deprecation
> period with apppropriate logging, then make it a hard failure.
> 
> This would provide services time to find and fix any sharing problems
> they might have, but would delay the timeframe for a final fix.
> 
> 6-ish) Fix oslo-incubator service.py to close all file descriptors after
> forking.
> 
> This is a best practice anyway so it's something we intend to pursue,
> but it's probably more of a long-term fix because it will take some work
> to implement and make sure it doesn't break existing services.  It also
> papers over the problem and according to Mike is basically a slower and
> messier alternative to his current proposed change, so it's probably a
> tangential change to avoid this in the future as opposed to a solution.
> 
> If you've made it this far, thank you and please provide thoughts on the
> options presented above. :-)

I'm not sure why 6 is "slower", can someone elaborate on that?

Doug

> 
> -Ben
> 
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe:
> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



More information about the OpenStack-dev mailing list