[openstack-dev] Oslo logging eats system level tracebacks by default

Morgan Fainberg morgan.fainberg at gmail.com
Wed May 28 15:53:58 UTC 2014


+1 Providing service crashing information is very valuable. In general we need to provide as much information about why the service exited (critically/traceback/unexpectedly) for our operators.

—Morgan
—
Morgan Fainberg


From: Jay Pipes jaypipes at gmail.com
Reply: OpenStack Development Mailing List (not for usage questions) openstack-dev at lists.openstack.org
Date: May 28, 2014 at 08:50:25
To: openstack-dev at lists.openstack.org openstack-dev at lists.openstack.org
Subject:  Re: [openstack-dev] Oslo logging eats system level tracebacks by default  

On 05/28/2014 11:39 AM, Doug Hellmann wrote:  
> On Wed, May 28, 2014 at 10:38 AM, Sean Dague <sean at dague.net> wrote:  
>> When attempting to build a new tool for Tempest, I found that my python  
>> syntax errors were being completely eaten. After 2 days of debugging I  
>> found that oslo log.py does the following *very unexpected* thing.  
>>  
>> - replaces the sys.excepthook with it's own function  
>> - eats the execption traceback unless debug or verbose are set to True  
>> - sets debug and verbose to False by default  
>> - prints out a completely useless summary log message at Critical  
>> ([CRITICAL] [-] 'id' was my favorite of these)  
>>  
>> This is basically for an exit level event. Something so breaking that  
>> your program just crashed.  
>>  
>> Note this has nothing to do with preventing stack traces that are  
>> currently littering up the logs that happen at many logging levels, it's  
>> only about removing the stack trace of a CRITICAL level event that's  
>> going to very possibly result in a crashed daemon with no information as  
>> to why.  
>>  
>> So the process of including oslo log makes the code immediately  
>> undebuggable unless you change your config file to not the default.  
>>  
>> Whether or not there was justification for this before, one of the  
>> things we heard loud and clear from the operator's meetup was:  
>>  
>> - Most operators are running at DEBUG level for all their OpenStack  
>> services because you can't actually do problem determination in  
>> OpenStack for anything < that.  
>> - Operators reacted negatively to the idea of removing stack traces  
>> from logs, as that's typically the only way to figure out what's going  
>> on. It took a while of back and forth to explain that our initiative to  
>> do that wasn't about removing them per say, but having the code  
>> correctly recover.  
>>  
>> So the current oslo logging behavior seems inconsistent (we spew  
>> exceptions at INFO and WARN levels, and hide all the important stuff  
>> with a legitimately uncaught system level crash), undebuggable, and  
>> completely against the prevailing wishes of the operator community.  
>>  
>> I'd like to change that here - https://review.openstack.org/#/c/95860/  
>>  
>> -Sean  
>  
> I agree, we should dump as much detail as we can when we encounter an  
> unhandled exception that causes an app to die.  

+1  

-jay  

_______________________________________________  
OpenStack-dev mailing list  
OpenStack-dev at lists.openstack.org  
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20140528/a062704b/attachment.html>


More information about the OpenStack-dev mailing list