[Openstack-operators] Log Rationalization -- Bring it on!
rochelle.grober at huawei.com
Wed Sep 17 23:42:57 UTC 2014
TL;DR: I consider the poor state of log consistency a major impediment for more widespread adoption of OpenStack and would like to volunteer to own this cross-functional process to begin to unify and standardize logging messages and attributes for Kilo while dealing with the most egregious issues as the community identifies them.
Recap from some mail threads:
>From Sean Dague on Kilo cycle goals:
2. Consistency in southbound interfaces (Logging first)
Logging and notifications are south bound interfaces from OpenStack providing information to people, or machines, about what is going on.
There is also a 3rd proposed south bound with osprofiler.
For Kilo: I think it's reasonable to complete the logging standards and implement them. I expect notifications (which haven't quite kicked off) are going to take 2 cycles.
I'd honestly *really* love to see a unification path for all the the southbound parts, logging, osprofiler, notifications, because there is quite a bit of overlap in the instrumentation/annotation inside the main code for all of these.
And from Doug Hellmann:
1. Sean has done a lot of analysis and started a spec on standardizing logging guidelines where he is gathering input from developers, deployers, and operators . Because it is far enough for us to see real progress, it's a good place for us to start experimenting with how to drive cross-project initiatives involving code and policy changes from outside of a single project. We have a couple of potentially related specs in Oslo as part of the oslo.log graduation work  , but I think most of the work will be within the applications.
And from James Blair:
1) Improve log correlation and utility
If we're going to improve the stability of OpenStack, we have to be able to understand what's going on when it breaks. That's both true as developers when we're trying to diagnose a failure in an integration test, and it's true for operators who are all too often diagnosing the same failure in a real deployment. Consistency in logging across projects as well as a cross-project request token would go a long way toward this.
While I am not currently managing an OpenStack deployment, writing tests or code, or debugging the stack, I have spent many years doing just that. Through QA, Ops and Customer support, I have come to revel in good logging and log messages and curse the holes and vagaries in many systems.
Defining/refining logs to be useful and usable is a cross-functional effort that needs to include:
· End Users
· Community managers
· Tech Pubs
· TC (which provides the forum and impetus for all the projects to cooperate on this)
At the moment, I think this effort may best work under the auspices of Oslo (oslo.log), I'd love to hear other proposals.
Here is the beginnings of my proposal of how to attack and subdue the painful state of logs:
· Post this email to the MLs (dev, ops, enduser) to get feedback, garner support and participants in the process
· In parallel:
o Collect up problems, issues, ideas, solutions on an etherpad https://etherpad.openstack.org/p/Log-Rationalization where anyone in the communities can post.
o Categorize reported Log issues into classes (already identified classes):
§ Format Consistency across projects
§ Log level definition and categorization across classes
§ Time syncing entries across tens of logfiles
§ Relevancy/usefulness of information provided within messages
§ Etc (missing a lot here, but I'm sure folks will speak up)
o Analyze existing log message formats, standards across integrated projects
o File bugs where issues identified are actual project bugs
o Build a session outline for F2F working session at the Paris Design Summit
· At the Paris Design Summit, use a session and/or pod discussions to set priorities, recruit contributors, start and/or flesh out specs and blueprints
· Proceed according to priorities, specs, blueprints, contributions and changes as needed as the work progresses.
· Keep an active and open rapport and reporting process for the user community to comment and participate in the processes.
Measures of success:
· Log messages provide consistency of format enough for productive mining through operator writable scripts
· Problem debugging is simplified through the ability to trust timestamps across all OpenStack logs (and use scripts to get to the time you want in any/all of the logfiles)
· Standards for format, content, levels and translations have been proposed and agreed to be adopted across all OpenStack integrated projects
· The user communities demonstrate an increased level of trust and decreased level of frustration with OpenStack logging (surveys, bug reports, other measures?)
· The log team can disband
I expect that getting the logs in very good shape will take more than just the Kilo timeframe, but once momentum has built, which should happened during Kilo, the process should move very quickly. A lot of this could be handled through "while you're in there" or "low hanging fruit" once the standards are established. The bigger win will be if we can ensure what we define/design is extensible over the longer life of OpenStack.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the OpenStack-operators