[openstack-dev] RFC - Icehouse logging harmonization

Sean Dague sean at dague.net
Wed Oct 23 18:20:19 UTC 2013

One of the efforts that we're working on from the QA team is tooling 
that ensures we aren't stack tracing into our test logs during normal 
tempest runs. Random stack traces are scary to cloud admins consuming 
OpenStack logs, and exceptions in the logs should really be exceptional 
events (and indicative of a failing system), not something that we do by 
default. Our intent is to gate code on clean logs (no stacktraces) 
eventually (i.e. if you try to land a patch that causes stack traces in 
OpenStack, that becomes a failing condition), and we've got an 
incremental white list based approach that should let us make forward 
progress on that. But on that thread - 
we exposed another issue... across projects, OpenStack is very 
inconsistent with logging.

First... baseline, these are the logging levels that we have in 
OpenStack today (with numeric values, higher = worse):

ERROR = 40
AUDIT = 21  # invented for oslo-logging
INFO = 20
DEBUG = 10

We also have TRACE, which isn't a level per say, it happens at another 
level. However TRACE is typically an ERROR in the way we use it.

Some examples of oddities in the current system (all from a single 
devstack/tempest run):

Example 1:

     n-conductor log in tempest/devstack - 

     Total log lines: 84076
     Total non DEBUG lines: 61

     Question: do we need more than 1 level of DEBUG? 3 orders of 
magnitude information change between INFO -> DEBUG seems too steep a cliff.

Example 2:

     ceilometer-collector - 

     AUDIT log level being used as DEBUG level (even though it's higher 
than INFO).

2013-10-23 12:24:20.093 26234 AUDIT ceilometer.pipeline [-] Flush 
pipeline meter_pipeline
2013-10-23 12:24:20.093 26234 AUDIT ceilometer.pipeline [-] Flush 
pipeline cpu_pipeline
2013-10-23 12:24:20.094 26234 AUDIT ceilometer.pipeline [-] Flush 
pipeline meter_pipeline
2013-10-23 12:24:20.094 26234 AUDIT ceilometer.pipeline [-] Flush 
pipeline cpu_pipeline

(this is every second, for most seconds, for the entire run)

Example 3:

     cinder-api - 
     ERROR level being used for 404s of volumes

Example 4:
glance-api - 

2013-10-23 12:23:27.436 23731 ERROR glance.store.sheepdog [-] Error in 
store configuration: Unexpected error while running command.Command: 
collieExit code: 127Stdout: ''Stderr: '/bin/sh: 1: collie: not found\n'
2013-10-23 12:23:27.436 23731 WARNING glance.store.base [-] Failed to 
configure store correctly: Store sheepdog could not be configured 
correctly. Reason: Error in store configuration: Unexpected error while 
running command.Command: collieExit code: 127Stdout: ''Stderr: '/bin/sh: 
1: collie: not found\n' Disabling add method.

part of every single Tempest / Devstack run, even though we aren't 
trying to configure sheepdog in the gate

I think we can, and should do better, and started trying to brain dump 
into this etherpad - 
(examples included).

This is one of those topics that I think our current 6 track summit 
model doesn't make easy address, as we really need general concensus 
across any project that's using oslo-logging, so I believe mailing list 
is the better option, at least for now.

Goals - Short Term
As much feedback as possible from both core projects and openstack 
deployers about the kinds of things that they believe we should be 
logging, and the kinds of levels they think those things should land at.

Determining how crazy it is to try to harmonize this across services.

Figure out who else wants to help. Where help means:
  * helping figure out what's already concensus in services
  * helping figure out things that are really aberrant from that concensus
  * helping build concensus with various core teams on a common
  * helping with contributions to projects that are interested in 
contributions to move them closer to the concensus

Determining if everyone just hates the idea, and I should give up now. 
:) (That is a valid response to this RFC, feel free to put that out there).

Goals - Long Term
A set of guidelines on logging standards so that OpenStack as a whole 
feels more whole when it comes to dealing with the log data.

These are going to be guidelines, not rules. Some projects are always 
going to have unique needs. But I suspect a lot of developers would be 
happy with a cheet sheet that helps them figure out what log level 
different things are presented at.

Comments / rotten fruit are welcomed. Also welcomed are any additional 
POV on this issue.


Sean Dague

More information about the OpenStack-dev mailing list