[openstack-dev] os-loganalyze, project log parsing, or ...

Andrew Laski andrew at lascii.com
Tue Sep 27 19:32:17 UTC 2016



On Tue, Sep 27, 2016, at 02:40 PM, Matthew Treinish wrote:
> On Tue, Sep 27, 2016 at 01:03:35PM -0400, Andrew Laski wrote:
> > 
> > 
> > On Tue, Sep 27, 2016, at 12:39 PM, Matthew Treinish wrote:
> > > On Tue, Sep 27, 2016 at 11:36:07AM -0400, Andrew Laski wrote:
> > > > Hello all,
> > > > 
> > > > Recently I noticed that people would look at logs from a Zuul née
> > > > Jenkins CI run and comment something like "there seem to be more
> > > > warnings in here than usual." And so I thought it might be nice to
> > > > quantify that sort of thing so we didn't have to rely on gut feelings.
> > > > 
> > > > So I threw together https://review.openstack.org/#/c/376531 which is a
> > > > script that lives in the Nova tree, gets called from a devstack-gate
> > > > post_test_hook, and outputs an n-stats.json file which can be seen at
> > > > http://logs.openstack.org/06/375106/8/check/gate-tempest-dsvm-multinode-live-migration-ubuntu-xenial/e103612/logs/n-stats.json.
> > > > This provides just a simple way to compare two runs and spot large
> > > > changes between them. Perhaps later things could get fancy and these
> > > > stats could be tracked over time. I am also interested in adding stats
> > > > for things that are a bit project specific like how long (max, min, med)
> > > > it took to boot an instance, or what's probably better to track is how
> > > > many operations that took for some definition of an operation.
> > > > 
> > > > I received some initial feedback that this might be a better fit in the
> > > > os-loganalyze project so I took a look over there. So I cloned the
> > > > project to take a look and quickly noticed
> > > > http://git.openstack.org/cgit/openstack-infra/os-loganalyze/tree/README.rst#n13.
> > > > That makes me think it would not be a good fit there because what I'm
> > > > looking to do relies on parsing the full file, or potentially multiple
> > > > files, in order to get useful data.
> > > > 
> > > > So my questions: does this seem like a good fit for os-loganalyze? If
> > > > not is there another infra/QA project that this would be a good fit for?
> > > > Or would people be okay with a lone project like Nova implementing this
> > > > in tree for their own use?
> > > > 
> > > 
> > > I think having this in os-loganalyze makes sense since we use that for
> > > visualizing the logs already. It also means we get it for free on all the
> > > log
> > > files. But, if it's not a good fit for a technical reason then I think
> > > creating
> > > another small tool under QA or infra would be a good path forward. Since
> > > there
> > > really isn't anything nova specific in that.
> > 
> > There's nothing Nova specific atm because I went for low hanging fruit.
> > But if the plan is to have Nova specific, Cinder specific, Glance
> > specific, etc... things in there do people still feel that a QA/infra
> > tool is the right path forward. That's my only hesitation here.
> 
> Well I think that raises more questions, what do you envision the nova
> specific
> bits would be. The only thing I could see would be something that looks
> for
> specific log messages or patterns in the logs. Which feels like exactly
> what
> elastic-recheck does?

I'm thinking beyond single line things. An example could be a parser
that can calculate the timing between the first log message seen for a
request-id and the last, or could count the number of log lines
associated with each instance boot perhaps even broken down by log
level. Things that require both an understanding of how to correlate
groups of log lines with specific events(instance boot), and being able
to calculate stats for groups of log lines(debug log line count by
request-id).

I have only a rudimentary familiarity with elastic-recheck but my
understanding is that doing anything that looks at multiple lines like
that is either complex or not really possible.


> 
> I definitely can see the value in having machine parsable log stats in
> our
> artifacts, but I'm not sure where project specific pieces would come
> from. But,
> given that hypothetical I would say as long as you made those pieces
> configurable (like a yaml syntax to search for patterns by log file or
> something) and kept a generic framework/tooling for parsing the log stats
> I
> think it's still a good fit for a QA or Infra project. Especially if you
> think
> whatever pattern you're planning to use is something other projects would
> want
> to reuse.

My concern here is that I want to go beyond simple pattern matching. I
want to be able to maintain state while parsing to associate log lines
with events that came before. The project specific bits I envision are
the logic to handle that, but I don't think yaml is expressive enough
for it. I came up with a quick example at
http://paste.openstack.org/show/583160/ . That's Nova specific and
beyond my capability to express in yaml or elastic-recheck.

-Andrew

> 
> -Matt Treinish
> 
> 
> > 
> > > 
> > > I would caution against doing it as a one off in a project repo doesn't
> > > seem
> > > like the best path forward for something like this. We actually tried to
> > > do
> > > something similar to that in the past inside the tempest repo:
> > > 
> > > http://git.openstack.org/cgit/openstack/tempest/tree/tools/check_logs.py
> > > 
> > > and
> > > 
> > > http://git.openstack.org/cgit/openstack/tempest/tree/tools/find_stack_traces.py
> > > 
> > > all it did was cause confusion because no one knew where the output was
> > > coming
> > > from. Although, the output from those tools was also misleading, which
> > > was
> > > likely a bigger problm. So this probably won't be an issue if you add a
> > > json
> > > output to the jobs.
> > > 
> > > I also wonder if the JSONFormatter from oslo.log:
> > > 
> > > http://docs.openstack.org/developer/oslo.log/api/formatters.html#oslo_log.formatters.JSONFormatter
> > > 
> > > would be useful here. We can proabbly turn that on if it makes things
> > > easier.
> > > 
> > > -Matt Treinish
> > > __________________________________________________________________________
> > > OpenStack Development Mailing List (not for usage questions)
> > > Unsubscribe:
> > > OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> > > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> > > Email had 1 attachment:
> > > + signature.asc
> > >   1k (application/pgp-signature)
> > 
> > __________________________________________________________________________
> > OpenStack Development Mailing List (not for usage questions)
> > Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe:
> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> Email had 1 attachment:
> + signature.asc
>   1k (application/pgp-signature)



More information about the OpenStack-dev mailing list