[openstack-dev] RFC - Icehouse logging harmonization
Sean Dague
sean at dague.net
Wed Oct 23 18:20:19 UTC 2013
One of the efforts that we're working on from the QA team is tooling
that ensures we aren't stack tracing into our test logs during normal
tempest runs. Random stack traces are scary to cloud admins consuming
OpenStack logs, and exceptions in the logs should really be exceptional
events (and indicative of a failing system), not something that we do by
default. Our intent is to gate code on clean logs (no stacktraces)
eventually (i.e. if you try to land a patch that causes stack traces in
OpenStack, that becomes a failing condition), and we've got an
incremental white list based approach that should let us make forward
progress on that. But on that thread -
http://lists.openstack.org/pipermail/openstack-dev/2013-October/017012.html
we exposed another issue... across projects, OpenStack is very
inconsistent with logging.
First... baseline, these are the logging levels that we have in
OpenStack today (with numeric values, higher = worse):
CRITICAL = 50
FATAL = CRITICAL
ERROR = 40
WARNING = 30
WARN = WARNING
AUDIT = 21 # invented for oslo-logging
INFO = 20
DEBUG = 10
NOTSET = 0
We also have TRACE, which isn't a level per say, it happens at another
level. However TRACE is typically an ERROR in the way we use it.
Some examples of oddities in the current system (all from a single
devstack/tempest run):
Example 1:
==============
n-conductor log in tempest/devstack -
http://logs.openstack.org/70/52870/3/check/check-tempest-devstack-vm-full/f46b756/logs/screen-n-cond.txt.gz
Total log lines: 84076
Total non DEBUG lines: 61
Question: do we need more than 1 level of DEBUG? 3 orders of
magnitude information change between INFO -> DEBUG seems too steep a cliff.
Example 2:
==============
ceilometer-collector -
http://logs.openstack.org/70/52870/3/check/check-tempest-devstack-vm-full/f46b756/logs/screen-ceilometer-collector.txt.gz
AUDIT log level being used as DEBUG level (even though it's higher
than INFO).
2013-10-23 12:24:20.093 26234 AUDIT ceilometer.pipeline [-] Flush
pipeline meter_pipeline
2013-10-23 12:24:20.093 26234 AUDIT ceilometer.pipeline [-] Flush
pipeline cpu_pipeline
2013-10-23 12:24:20.094 26234 AUDIT ceilometer.pipeline [-] Flush
pipeline meter_pipeline
2013-10-23 12:24:20.094 26234 AUDIT ceilometer.pipeline [-] Flush
pipeline cpu_pipeline
(this is every second, for most seconds, for the entire run)
Example 3:
===============
cinder-api -
http://logs.openstack.org/70/52870/3/check/check-tempest-devstack-vm-full/f46b756/logs/screen-c-api.txt.gz?level=ERROR
ERROR level being used for 404s of volumes
Example 4:
===============
glance-api -
http://logs.openstack.org/70/52870/3/check/check-tempest-devstack-vm-full/f46b756/logs/screen-g-api.txt.gz
2013-10-23 12:23:27.436 23731 ERROR glance.store.sheepdog [-] Error in
store configuration: Unexpected error while running command.Command:
collieExit code: 127Stdout: ''Stderr: '/bin/sh: 1: collie: not found\n'
2013-10-23 12:23:27.436 23731 WARNING glance.store.base [-] Failed to
configure store correctly: Store sheepdog could not be configured
correctly. Reason: Error in store configuration: Unexpected error while
running command.Command: collieExit code: 127Stdout: ''Stderr: '/bin/sh:
1: collie: not found\n' Disabling add method.
part of every single Tempest / Devstack run, even though we aren't
trying to configure sheepdog in the gate
I think we can, and should do better, and started trying to brain dump
into this etherpad -
https://etherpad.openstack.org/p/icehouse-logging-harmonization
(examples included).
This is one of those topics that I think our current 6 track summit
model doesn't make easy address, as we really need general concensus
across any project that's using oslo-logging, so I believe mailing list
is the better option, at least for now.
Goals - Short Term
===============
As much feedback as possible from both core projects and openstack
deployers about the kinds of things that they believe we should be
logging, and the kinds of levels they think those things should land at.
Determining how crazy it is to try to harmonize this across services.
Figure out who else wants to help. Where help means:
* helping figure out what's already concensus in services
* helping figure out things that are really aberrant from that concensus
* helping build concensus with various core teams on a common
* helping with contributions to projects that are interested in
contributions to move them closer to the concensus
Determining if everyone just hates the idea, and I should give up now.
:) (That is a valid response to this RFC, feel free to put that out there).
Goals - Long Term
===============
A set of guidelines on logging standards so that OpenStack as a whole
feels more whole when it comes to dealing with the log data.
These are going to be guidelines, not rules. Some projects are always
going to have unique needs. But I suspect a lot of developers would be
happy with a cheet sheet that helps them figure out what log level
different things are presented at.
Comments / rotten fruit are welcomed. Also welcomed are any additional
POV on this issue.
-Sean
--
Sean Dague
http://dague.net
More information about the OpenStack-dev
mailing list