[openstack-dev] Oslo logging eats system level tracebacks by default

Jay Pipes jaypipes at gmail.com
Wed May 28 15:49:12 UTC 2014


On 05/28/2014 11:39 AM, Doug Hellmann wrote:
> On Wed, May 28, 2014 at 10:38 AM, Sean Dague <sean at dague.net> wrote:
>> When attempting to build a new tool for Tempest, I found that my python
>> syntax errors were being completely eaten. After 2 days of debugging I
>> found that oslo log.py does the following *very unexpected* thing.
>>
>>   - replaces the sys.excepthook with it's own function
>>   - eats the execption traceback unless debug or verbose are set to True
>>   - sets debug and verbose to False by default
>>   - prints out a completely useless summary log message at Critical
>> ([CRITICAL] [-] 'id' was my favorite of these)
>>
>> This is basically for an exit level event. Something so breaking that
>> your program just crashed.
>>
>> Note this has nothing to do with preventing stack traces that are
>> currently littering up the logs that happen at many logging levels, it's
>> only about removing the stack trace of a CRITICAL level event that's
>> going to very possibly result in a crashed daemon with no information as
>> to why.
>>
>> So the process of including oslo log makes the code immediately
>> undebuggable unless you change your config file to not the default.
>>
>> Whether or not there was justification for this before, one of the
>> things we heard loud and clear from the operator's meetup was:
>>
>>   - Most operators are running at DEBUG level for all their OpenStack
>> services because you can't actually do problem determination in
>> OpenStack for anything < that.
>>   - Operators reacted negatively to the idea of removing stack traces
>> from logs, as that's typically the only way to figure out what's going
>> on. It took a while of back and forth to explain that our initiative to
>> do that wasn't about removing them per say, but having the code
>> correctly recover.
>>
>> So the current oslo logging behavior seems inconsistent (we spew
>> exceptions at INFO and WARN levels, and hide all the important stuff
>> with a legitimately uncaught system level crash), undebuggable, and
>> completely against the prevailing wishes of the operator community.
>>
>> I'd like to change that here - https://review.openstack.org/#/c/95860/
>>
>>          -Sean
>
> I agree, we should dump as much detail as we can when we encounter an
> unhandled exception that causes an app to die.

+1

-jay



More information about the OpenStack-dev mailing list