[openstack-dev] RFC - Icehouse logging harmonization

Clark Boylan clark.boylan at gmail.com
Wed Oct 23 19:03:27 UTC 2013


On Wed, Oct 23, 2013 at 11:20 AM, Sean Dague <sean at dague.net> wrote:
> One of the efforts that we're working on from the QA team is tooling that
> ensures we aren't stack tracing into our test logs during normal tempest
> runs. Random stack traces are scary to cloud admins consuming OpenStack
> logs, and exceptions in the logs should really be exceptional events (and
> indicative of a failing system), not something that we do by default. Our
> intent is to gate code on clean logs (no stacktraces) eventually (i.e. if
> you try to land a patch that causes stack traces in OpenStack, that becomes
> a failing condition), and we've got an incremental white list based approach
> that should let us make forward progress on that. But on that thread -
> http://lists.openstack.org/pipermail/openstack-dev/2013-October/017012.html
> we exposed another issue... across projects, OpenStack is very inconsistent
> with logging.
>
> First... baseline, these are the logging levels that we have in OpenStack
> today (with numeric values, higher = worse):
>
> CRITICAL = 50
> FATAL = CRITICAL
> ERROR = 40
> WARNING = 30
> WARN = WARNING
> AUDIT = 21  # invented for oslo-logging
> INFO = 20
> DEBUG = 10
> NOTSET = 0
>
> We also have TRACE, which isn't a level per say, it happens at another
> level. However TRACE is typically an ERROR in the way we use it.
>
>
> Some examples of oddities in the current system (all from a single
> devstack/tempest run):
>
> Example 1:
> ==============
>
>     n-conductor log in tempest/devstack -
> http://logs.openstack.org/70/52870/3/check/check-tempest-devstack-vm-full/f46b756/logs/screen-n-cond.txt.gz
>
>     Total log lines: 84076
>     Total non DEBUG lines: 61
>
>     Question: do we need more than 1 level of DEBUG? 3 orders of magnitude
> information change between INFO -> DEBUG seems too steep a cliff.
>
> Example 2:
> ==============
>
>     ceilometer-collector -
> http://logs.openstack.org/70/52870/3/check/check-tempest-devstack-vm-full/f46b756/logs/screen-ceilometer-collector.txt.gz
>
>     AUDIT log level being used as DEBUG level (even though it's higher than
> INFO).
>
> 2013-10-23 12:24:20.093 26234 AUDIT ceilometer.pipeline [-] Flush pipeline
> meter_pipeline
> 2013-10-23 12:24:20.093 26234 AUDIT ceilometer.pipeline [-] Flush pipeline
> cpu_pipeline
> 2013-10-23 12:24:20.094 26234 AUDIT ceilometer.pipeline [-] Flush pipeline
> meter_pipeline
> 2013-10-23 12:24:20.094 26234 AUDIT ceilometer.pipeline [-] Flush pipeline
> cpu_pipeline
>
> (this is every second, for most seconds, for the entire run)
>
> Example 3:
> ===============
>
>     cinder-api -
> http://logs.openstack.org/70/52870/3/check/check-tempest-devstack-vm-full/f46b756/logs/screen-c-api.txt.gz?level=ERROR
>     ERROR level being used for 404s of volumes
>
> Example 4:
> ===============
> glance-api -
> http://logs.openstack.org/70/52870/3/check/check-tempest-devstack-vm-full/f46b756/logs/screen-g-api.txt.gz
>
> 2013-10-23 12:23:27.436 23731 ERROR glance.store.sheepdog [-] Error in store
> configuration: Unexpected error while running command.Command: collieExit
> code: 127Stdout: ''Stderr: '/bin/sh: 1: collie: not found\n'
> 2013-10-23 12:23:27.436 23731 WARNING glance.store.base [-] Failed to
> configure store correctly: Store sheepdog could not be configured correctly.
> Reason: Error in store configuration: Unexpected error while running
> command.Command: collieExit code: 127Stdout: ''Stderr: '/bin/sh: 1: collie:
> not found\n' Disabling add method.
>
> part of every single Tempest / Devstack run, even though we aren't trying to
> configure sheepdog in the gate
>
>
> I think we can, and should do better, and started trying to brain dump into
> this etherpad -
> https://etherpad.openstack.org/p/icehouse-logging-harmonization (examples
> included).
>
> This is one of those topics that I think our current 6 track summit model
> doesn't make easy address, as we really need general concensus across any
> project that's using oslo-logging, so I believe mailing list is the better
> option, at least for now.
>
>
> Goals - Short Term
> ===============
> As much feedback as possible from both core projects and openstack deployers
> about the kinds of things that they believe we should be logging, and the
> kinds of levels they think those things should land at.
>
> Determining how crazy it is to try to harmonize this across services.
>
> Figure out who else wants to help. Where help means:
>  * helping figure out what's already concensus in services
>  * helping figure out things that are really aberrant from that concensus
>  * helping build concensus with various core teams on a common
>  * helping with contributions to projects that are interested in
> contributions to move them closer to the concensus
>
> Determining if everyone just hates the idea, and I should give up now. :)
> (That is a valid response to this RFC, feel free to put that out there).
>
>
> Goals - Long Term
> ===============
> A set of guidelines on logging standards so that OpenStack as a whole feels
> more whole when it comes to dealing with the log data.
>
> These are going to be guidelines, not rules. Some projects are always going
> to have unique needs. But I suspect a lot of developers would be happy with
> a cheet sheet that helps them figure out what log level different things are
> presented at.
>
>
> Comments / rotten fruit are welcomed. Also welcomed are any additional POV
> on this issue.
>
>         -Sean
>
This is great. In additional to all this I want to point out that the
TRACE level is also a oslo-logging invention. In general I am not a
fan of having additional levels as compatibility with python logging
and associated tools would be a good thing, but I can see their
usefulness if we need more granularity. Also, this isn't just for
developers and QA, deployers of OpenStack would benefit tremendously
from more consistent and useful logging. After throwing together
logstash filters[1] for the OpenStack service logs produced by
Devstack in the gate I have found the logs to be quite diverse which
makes dealing with them in a centralized fashion more difficult. +1
from me to guidelines which should drive more consistency.

This got better during the Havana cycle, but we can definitely do more
in Icehouse.

[1] https://git.openstack.org/cgit/openstack-infra/config/tree/modules/openstack_project/templates/logstash/indexer.conf.erb#n11

Clark



More information about the OpenStack-dev mailing list