[openstack-dev] Oslo logging eats system level tracebacks by default

Joshua Harlow harlowja at yahoo-inc.com
Thu May 29 01:37:17 UTC 2014


An idea, and one that I've tried to apply in taskflow.

Since 3.0 added with http://legacy.python.org/dev/peps/pep-3134 if we can *simulate* except chaining in our projects this would likely help even more with traceability and debugging. I have a common exception that that I've been using to approximate this (since chaining doesn’t work/exist in 2.7 and 2.6) and it might be useful for others to have similar types of exceptions (and try to print out as much of it as possible when errors occur).

https://github.com/openstack/taskflow/blob/master/taskflow/exceptions.py#L22 (see pformat() method that dumps a large string containing all connected causes).

Might be useful for others (if there are better approaches/libraries that do similar things let me know),

-Josh

From: Morgan Fainberg <morgan.fainberg at gmail.com<mailto:morgan.fainberg at gmail.com>>
Reply-To: "OpenStack Development Mailing List (not for usage questions)" <openstack-dev at lists.openstack.org<mailto:openstack-dev at lists.openstack.org>>
Date: Wednesday, May 28, 2014 at 8:53 AM
To: Jay Pipes <jaypipes at gmail.com<mailto:jaypipes at gmail.com>>, "OpenStack Development Mailing List (not for usage questions)" <openstack-dev at lists.openstack.org<mailto:openstack-dev at lists.openstack.org>>
Subject: Re: [openstack-dev] Oslo logging eats system level tracebacks by default

+1 Providing service crashing information is very valuable. In general we need to provide as much information about why the service exited (critically/traceback/unexpectedly) for our operators.

—Morgan

—
Morgan Fainberg

From: Jay Pipes jaypipes at gmail.com<mailto:jaypipes at gmail.com>
Reply: OpenStack Development Mailing List (not for usage questions)openstack-dev at lists.openstack.org<mailto:openstack-dev at lists.openstack.org>
Date: May 28, 2014 at 08:50:25
To: openstack-dev at lists.openstack.org<mailto:openstack-dev at lists.openstack.org> openstack-dev at lists.openstack.org<mailto:openstack-dev at lists.openstack.org>
Subject:  Re: [openstack-dev] Oslo logging eats system level tracebacks by default

On 05/28/2014 11:39 AM, Doug Hellmann wrote:
> On Wed, May 28, 2014 at 10:38 AM, Sean Dague <sean at dague.net<mailto:sean at dague.net>> wrote:
>> When attempting to build a new tool for Tempest, I found that my python
>> syntax errors were being completely eaten. After 2 days of debugging I
>> found that oslo log.py does the following *very unexpected* thing.
>>
>> - replaces the sys.excepthook with it's own function
>> - eats the execption traceback unless debug or verbose are set to True
>> - sets debug and verbose to False by default
>> - prints out a completely useless summary log message at Critical
>> ([CRITICAL] [-] 'id' was my favorite of these)
>>
>> This is basically for an exit level event. Something so breaking that
>> your program just crashed.
>>
>> Note this has nothing to do with preventing stack traces that are
>> currently littering up the logs that happen at many logging levels, it's
>> only about removing the stack trace of a CRITICAL level event that's
>> going to very possibly result in a crashed daemon with no information as
>> to why.
>>
>> So the process of including oslo log makes the code immediately
>> undebuggable unless you change your config file to not the default.
>>
>> Whether or not there was justification for this before, one of the
>> things we heard loud and clear from the operator's meetup was:
>>
>> - Most operators are running at DEBUG level for all their OpenStack
>> services because you can't actually do problem determination in
>> OpenStack for anything < that.
>> - Operators reacted negatively to the idea of removing stack traces
>> from logs, as that's typically the only way to figure out what's going
>> on. It took a while of back and forth to explain that our initiative to
>> do that wasn't about removing them per say, but having the code
>> correctly recover.
>>
>> So the current oslo logging behavior seems inconsistent (we spew
>> exceptions at INFO and WARN levels, and hide all the important stuff
>> with a legitimately uncaught system level crash), undebuggable, and
>> completely against the prevailing wishes of the operator community.
>>
>> I'd like to change that here - https://review.openstack.org/#/c/95860/
>>
>> -Sean
>
> I agree, we should dump as much detail as we can when we encounter an
> unhandled exception that causes an app to die.

+1

-jay

_______________________________________________
OpenStack-dev mailing list
OpenStack-dev at lists.openstack.org<mailto:OpenStack-dev at lists.openstack.org>
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20140529/cc59f639/attachment.html>


More information about the OpenStack-dev mailing list