[OpenStack-Infra] Log storage/serving

Sean Dague sean at dague.net
Tue Oct 15 20:46:20 UTC 2013


On 10/10/2013 01:42 PM, James E. Blair wrote:
<snip>
> Okay, let me try to summarize current thinking:
>
> * We want to try to avoid writing a tool that receives logs because
>    swift provides most/all of the needed functionality.
>    * The swift tempurl middleware will allow us to have the client
>      directly PUT files in swift using a HMAC signed token.
>    * This means any pre-processing of logs would need to happen with the
>      log-uploading-client or via some unspecified event trigger.
>
> * We may or may not want a log-serving app.
>    * We're doing neat things like filtering on level and html-ifying logs
>      as we serve them with our current log-serving app.
>    * We could probably do that processing pre-upload (including embedding
>      javascript in html pages to do the visual filtering) and then we
>      could serve static pages instead.
>    * A log serving app may be required to provide some kinds of indexes.
 >
> So to decide on the log-serving app, we need to figure out:
>
> 1) What do we want out of indexes?
>
> Let's take a current example log path:
>
>    http://logs.openstack.org/95/50795/4/check/check-grenade-devstack-vm/3c17e3c/console.html
>
> Ignoring the change[:-2] at the beginning since it's an implementation
> artifact, that's basically:
>
>    /change/patchset/pipeline/job/run[random]/
>
> The upload script can easily handle creating index pages below that
> point.  But since it runs in the context of a job run, it can't create
> index pages above that (besides the technical difficulty, we don't want
> to give it permission outside of its run anyway).  So I believe that
> without a log-receiving app, our only options are:
>
>    a) Use the static web swift middleware to provide indexes.  Due to the
>    intersection of this feature, CDN, and container sizes with our
>    current providers, this is complicated and we end up at a dead end
>    every time we talk through it.
>
>    b) Use a log-serving application to generate index pages where we need
>    them.  We could do this by querying swift.  If we eliminate the
>    ability to list ridiculously large indexes (like all changes, etc) and
>    restrict it down to the level of, say, a single change, then this
>    might be manageable.  However, swift may still have to perform a large
>    query to get us down to that level.
>
>    c) Reduce the discoverability of test runs.  We could actually just
>    collapse the whole path into a random string and leave that as a
>    comment in Gerrit.  Users would effectively never be able to discover
>    any runs other than the final ones that are reported in Gerrit, and
>    even comparing runs for different patchsets would involve looking up
>    the URL for each in the respective Gerrit comments.  Openstack-infra
>    tools, such as elastic-recheck, could still discover other runs by
>    watching for ZMQ or Gearman events.
>
>    This would make little difference to most end-users as well as project
>    tooling, but it would make it a little harder to develop new project
>    tooling without access to that event stream.
>
> Honestly, option C is growing on me, but I'd like some more feedback on
> that.
>
> 2) What do we want out of processing?
>
> Currently we HTMLify and filter logs by log level at run-time when
> serving them.  I think our choices are:
>
>    a) Continue doing this -- this requires a log-serving app that will
>    fetch logs from swift, process them, and serve them.
>
>    b) Pre-process logs before uploading them.  HTMLify and add
>    client-side javascript line-level filtering.  The logstash script may
>    need to do its own filtering since it won't be running a javascript
>    interpreter, but it could probably still do so based on metadata
>    encoded into the HTML by the pre-processor.  Old logs won't benefit
>    from new features in the pre-processor though (unless we really feel
>    like batch-reprocessing).
>
> I think the choices of 1c and 2b get us out of the business of running
> log servers altogether and moves all the logic and processing to the
> edges.  I'm leaning toward them for that reason.

I'm completely indifferent to how storage and upload happen. Filesystem 
/ swift / all is good to me.

However, my experience doing htmlify-screen-log.py, and it's maturation 
into openstack-infra/os-loganalyze, and the fact that I probably spend 
more time staring at devstack/tempest logs than just about anyone, has 
given me a couple of thoughts on log-serving.

Our logs are kind of interesting beasts. We have a few different 
formats, and we've got a number of different consumers. There are some 
real niceties of being able to put a dynamic layer between the raw logs 
and the consumer:

1) HTTP negotiation - both due to our wsgi app, as well as mod_deflate 
we are able to do content negotiation with the client to serve them the 
appropriate data. This means today you can get either the text/html 
version (if your client supports it), it it doesn't it gets you a 
text/plain version. The content is also wire compressed, automatically, 
based on your client's ability to handle the compression.

2) Dynamic Filtering - we added the level= parameter to the wsgi script 
to speed up logstash indexing, as it turns out that python is infinitely 
faster at throwing away DEBUG lines than logstash was. It turns out 
people love it to, because 
http://logs.openstack.org/70/44670/3/gate/gate-tempest-devstack-vm-neutron/02d68e3/logs/screen-q-svc.txt.gz?level=TRACE 
loads super quick, and lets you see where top issues are.

There are a few other interesting facts that we discovered in this 
process - n-cpu on a nova-network run comes in at about 5 MB gzipped (40 
MB uncompresssed) of html once we do our filtering on it. If you are 
running a browser other than Chrome on a nice Intel chip, life isn't 
good. A future enhancement here is to be nicer to people and disable 
DEBUG by default if the file size is too big.

A 40 MB html file means that client side filtering would be problematic. 
First off, you need to take a huge network hit anyway, secondly I expect 
the DOM manipulation at that level of complexity would even give Chrome 
a run for it's money.

And then there is just the nice idea of keeping the raw artifact and the 
presentation layer separate. The fact that we can update our 
presentation filter, and logs from last week, which we are still using 
to debug issues, are easier to ready, is a good thing.

So regardless of the eventual solution here, I *really* want the ability 
to have a presentation layer filter between the raw logs and the 
clients. HTTP has so many nice features of negotiation worked into the 
spec, which we're actually using today, and makes life easier for folks. 
And I'd really like to not loose that.

So 2a has strong vote from me.

	-Sean

-- 
Sean Dague
http://dague.net



More information about the OpenStack-Infra mailing list