[openstack-dev] Log Rationalization -- Bring it on!
Doug Hellmann
doug at doughellmann.com
Thu Sep 18 19:40:57 UTC 2014
On Sep 17, 2014, at 7:42 PM, Rochelle.RochelleGrober <rochelle.grober at huawei.com> wrote:
> TL;DR: I consider the poor state of log consistency a major impediment for more widespread adoption of OpenStack and would like to volunteer to own this cross-functional process to begin to unify and standardize logging messages and attributes for Kilo while dealing with the most egregious issues as the community identifies them.
>
> Recap from some mail threads:
>
> From Sean Dague on Kilo cycle goals:
> 2. Consistency in southbound interfaces (Logging first)
>
> Logging and notifications are south bound interfaces from OpenStack providing information to people, or machines, about what is going on.
> There is also a 3rd proposed south bound with osprofiler.
>
> For Kilo: I think it's reasonable to complete the logging standards and implement them. I expect notifications (which haven't quite kicked off) are going to take 2 cycles.
>
> I'd honestly *really* love to see a unification path for all the the southbound parts, logging, osprofiler, notifications, because there is quite a bit of overlap in the instrumentation/annotation inside the main code for all of these.
>
> And from Doug Hellmann:
> 1. Sean has done a lot of analysis and started a spec on standardizing logging guidelines where he is gathering input from developers, deployers, and operators [1]. Because it is far enough for us to see real progress, it’s a good place for us to start experimenting with how to drive cross-project initiatives involving code and policy changes from outside of a single project. We have a couple of potentially related specs in Oslo as part of the oslo.log graduation work [2] [3], but I think most of the work will be within the applications.
>
> [1] https://review.openstack.org/#/c/91446/
> [2] https://blueprints.launchpad.net/oslo.log/+spec/app-agnostic-logging-parameters
> [3] https://blueprints.launchpad.net/oslo.log/+spec/remove-context-adapter
>
> And from James Blair:
> 1) Improve log correlation and utility
>
> If we're going to improve the stability of OpenStack, we have to be able to understand what's going on when it breaks. That's both true as developers when we're trying to diagnose a failure in an integration test, and it's true for operators who are all too often diagnosing the same failure in a real deployment. Consistency in logging across projects as well as a cross-project request token would go a long way toward this.
>
> While I am not currently managing an OpenStack deployment, writing tests or code, or debugging the stack, I have spent many years doing just that. Through QA, Ops and Customer support, I have come to revel in good logging and log messages and curse the holes and vagaries in many systems.
>
> Defining/refining logs to be useful and usable is a cross-functional effort that needs to include:
> · Operators
> · QA
> · End Users
> · Community managers
> · Tech Pubs
> · Translators
> · Developers
> · TC (which provides the forum and impetus for all the projects to cooperate on this)
>
> At the moment, I think this effort may best work under the auspices of Oslo (oslo.log), I’d love to hear other proposals.
I’m sure there will be changes to make in the log library. However, because of the cross-project nature of the policy decisions, I think we should drive this from outside of Oslo. We can use the oslo.log developer docs as a place to formally document guidelines, and we can change the library to make it easier to follow those guidelines, but the specs to define the guidelines and the planning for rolling out the changes should happen in a more central place than oslo-specs.
>
> Here is the beginnings of my proposal of how to attack and subdue the painful state of logs:
>
> · Post this email to the MLs (dev, ops, enduser) to get feedback, garner support and participants in the process
> (Done;-)
FWIW, I’m only replying on the -dev list to avoid duplicate message from cross-posting. Figuring out how to gather input and collect it is one of the procedural issues we need to work out as part of starting an initiative like this. I like that you’ve started an etherpad for that.
We really do need to have the meta conversation about running cross-project initiatives, and I think this one has enough clear support that we could have that discussion without being side-tracked by what the initiative is trying to accomplish.
Doug
> · In parallel:
> o Collect up problems, issues, ideas, solutions on an etherpad https://etherpad.openstack.org/p/Log-Rationalization where anyone in the communities can post.
> o Categorize reported Log issues into classes (already identified classes):
> § Format Consistency across projects
> § Log level definition and categorization across classes
> § Time syncing entries across tens of logfiles
> § Relevancy/usefulness of information provided within messages
> § Etc (missing a lot here, but I’m sure folks will speak up)
> o Analyze existing log message formats, standards across integrated projects
> o File bugs where issues identified are actual project bugs
> o Build a session outline for F2F working session at the Paris Design Summit
> · At the Paris Design Summit, use a session and/or pod discussions to set priorities, recruit contributors, start and/or flesh out specs and blueprints
> · Proceed according to priorities, specs, blueprints, contributions and changes as needed as the work progresses.
> · Keep an active and open rapport and reporting process for the user community to comment and participate in the processes.
> Measures of success:
> · Log messages provide consistency of format enough for productive mining through operator writable scripts
> · Problem debugging is simplified through the ability to trust timestamps across all OpenStack logs (and use scripts to get to the time you want in any/all of the logfiles)
> · Standards for format, content, levels and translations have been proposed and agreed to be adopted across all OpenStack integrated projects
> · The user communities demonstrate an increased level of trust and decreased level of frustration with OpenStack logging (surveys, bug reports, other measures?)
> · The log team can disband
>
> I expect that getting the logs in very good shape will take more than just the Kilo timeframe, but once momentum has built, which should happened during Kilo, the process should move very quickly. A lot of this could be handled through “while you’re in there” or “low hanging fruit” once the standards are established. The bigger win will be if we can ensure what we define/design is extensible over the longer life of OpenStack.
>
> --Rocky
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
More information about the OpenStack-dev
mailing list