[openstack-dev] Oslo logging eats system level tracebacks by default
Davanum Srinivas
davanum at gmail.com
Wed May 28 19:11:03 UTC 2014
+1 from me.
On Wed, May 28, 2014 at 11:49 AM, Jay Pipes <jaypipes at gmail.com> wrote:
> On 05/28/2014 11:39 AM, Doug Hellmann wrote:
>>
>> On Wed, May 28, 2014 at 10:38 AM, Sean Dague <sean at dague.net> wrote:
>>>
>>> When attempting to build a new tool for Tempest, I found that my python
>>> syntax errors were being completely eaten. After 2 days of debugging I
>>> found that oslo log.py does the following *very unexpected* thing.
>>>
>>> - replaces the sys.excepthook with it's own function
>>> - eats the execption traceback unless debug or verbose are set to True
>>> - sets debug and verbose to False by default
>>> - prints out a completely useless summary log message at Critical
>>> ([CRITICAL] [-] 'id' was my favorite of these)
>>>
>>> This is basically for an exit level event. Something so breaking that
>>> your program just crashed.
>>>
>>> Note this has nothing to do with preventing stack traces that are
>>> currently littering up the logs that happen at many logging levels, it's
>>> only about removing the stack trace of a CRITICAL level event that's
>>> going to very possibly result in a crashed daemon with no information as
>>> to why.
>>>
>>> So the process of including oslo log makes the code immediately
>>> undebuggable unless you change your config file to not the default.
>>>
>>> Whether or not there was justification for this before, one of the
>>> things we heard loud and clear from the operator's meetup was:
>>>
>>> - Most operators are running at DEBUG level for all their OpenStack
>>> services because you can't actually do problem determination in
>>> OpenStack for anything < that.
>>> - Operators reacted negatively to the idea of removing stack traces
>>> from logs, as that's typically the only way to figure out what's going
>>> on. It took a while of back and forth to explain that our initiative to
>>> do that wasn't about removing them per say, but having the code
>>> correctly recover.
>>>
>>> So the current oslo logging behavior seems inconsistent (we spew
>>> exceptions at INFO and WARN levels, and hide all the important stuff
>>> with a legitimately uncaught system level crash), undebuggable, and
>>> completely against the prevailing wishes of the operator community.
>>>
>>> I'd like to change that here - https://review.openstack.org/#/c/95860/
>>>
>>> -Sean
>>
>>
>> I agree, we should dump as much detail as we can when we encounter an
>> unhandled exception that causes an app to die.
>
>
> +1
>
> -jay
>
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
--
Davanum Srinivas :: http://davanum.wordpress.com
More information about the OpenStack-dev
mailing list