Open Stack

Thu Oct 10 17:42:03 UTC 2013

Joshua Hesketh <joshua.hesketh at rackspace.com> writes:

> On 9/25/13 2:47 AM, James E. Blair wrote:
>> Joshua Hesketh <joshua.hesketh at rackspace.com> writes:
>>
>>> On 9/17/13 11:00 PM, Monty Taylor wrote:
>>>> On 09/16/2013 07:22 PM, Joshua Hesketh wrote:
>>>>> So if zuul dictates where a log goes and we place the objects in swift
>>>>> with that path (change / patchset / pipeline / job / run) then zuul
>>>>> could also handle placing indexes as it should know which objects to
>>>>> expect.
>>>>>
>>>>> That said, if the path is deterministic (such as that) and the workers
>>>>> provide the index for a run then I'm not sure how useful an index for
>>>>> patchsets would be. I'd be interested to know if anybody uses the link
>>>>> http://logs.openstack.org/34/45334/ without having come from gerrit or
>>>>> another source where it is published.
>>>> https://pypi.python.org/pypi/git-os-job
>>> Right, but that calculates the path (as far as I can see) so we
>>> therefore still don't necessarily need indexes generated.
>> The final portion of the URL, signifying the run, is effectively random.
>> So that tool actually relies on a one-level-up index page.  (That tool
>> works on post jobs rather than check or gate, but the issues are
>> similar).
>
> So two questions,
> 1) Do we need a random job run? is it for debugging or something? And
> if so, can we provide it another way.

I don't understand this question -- are you asking "does anyone need to
access a run other than the one left as a comment in gerrit?"  That's
answered in my text you quoted below.

> 2) What if the tool provided the index for its runs?

I think we agree that would be fairly easy, at least starting from the
point of the individual run and working down the tree.  I think it's the
indexes of runs that complicates this.

>>
>> Other than that, most end users do not use indexes outside of the
>> particular job run, and that's by design.  We try to put the most useful
>> URL in the message that is left in Gerrit.
>>
>> However, those of us working on the infrastructure itself, or those
>> engaged in special projects (such as mining old test logs), or even the
>> occasional person curious about whether the problem they are seeing was
>> encountered in _all_ runs of a test find the ability to locate logs from
>> any run _very_ useful.  If we lost that ability, we would literally have
>> no way to locate any logs other than the 'final' logs of a run, and
>> those only through the comment left in Gerrit, due to the issue
>> mentioned above.
>>
>> We can discuss doing that, but it would be a huge change from our
>> current practice.
> Yep, I'm convinced that the logs need to be accessible.

Okay, let me try to summarize current thinking:

* We want to try to avoid writing a tool that receives logs because
  swift provides most/all of the needed functionality.
  * The swift tempurl middleware will allow us to have the client
    directly PUT files in swift using a HMAC signed token.
  * This means any pre-processing of logs would need to happen with the
    log-uploading-client or via some unspecified event trigger.

* We may or may not want a log-serving app.
  * We're doing neat things like filtering on level and html-ifying logs
    as we serve them with our current log-serving app.
  * We could probably do that processing pre-upload (including embedding
    javascript in html pages to do the visual filtering) and then we
    could serve static pages instead.
  * A log serving app may be required to provide some kinds of indexes.

So to decide on the log-serving app, we need to figure out:

1) What do we want out of indexes?

Let's take a current example log path:

  http://logs.openstack.org/95/50795/4/check/check-grenade-devstack-vm/3c17e3c/console.html

Ignoring the change[:-2] at the beginning since it's an implementation
artifact, that's basically:

  /change/patchset/pipeline/job/run[random]/ 

The upload script can easily handle creating index pages below that
point.  But since it runs in the context of a job run, it can't create
index pages above that (besides the technical difficulty, we don't want
to give it permission outside of its run anyway).  So I believe that
without a log-receiving app, our only options are:

  a) Use the static web swift middleware to provide indexes.  Due to the
  intersection of this feature, CDN, and container sizes with our
  current providers, this is complicated and we end up at a dead end
  every time we talk through it.

  b) Use a log-serving application to generate index pages where we need
  them.  We could do this by querying swift.  If we eliminate the
  ability to list ridiculously large indexes (like all changes, etc) and
  restrict it down to the level of, say, a single change, then this
  might be manageable.  However, swift may still have to perform a large
  query to get us down to that level.

  c) Reduce the discoverability of test runs.  We could actually just
  collapse the whole path into a random string and leave that as a
  comment in Gerrit.  Users would effectively never be able to discover
  any runs other than the final ones that are reported in Gerrit, and
  even comparing runs for different patchsets would involve looking up
  the URL for each in the respective Gerrit comments.  Openstack-infra
  tools, such as elastic-recheck, could still discover other runs by
  watching for ZMQ or Gearman events.

  This would make little difference to most end-users as well as project
  tooling, but it would make it a little harder to develop new project
  tooling without access to that event stream.

Honestly, option C is growing on me, but I'd like some more feedback on
that.

2) What do we want out of processing?

Currently we HTMLify and filter logs by log level at run-time when
serving them.  I think our choices are:

  a) Continue doing this -- this requires a log-serving app that will
  fetch logs from swift, process them, and serve them.

  b) Pre-process logs before uploading them.  HTMLify and add
  client-side javascript line-level filtering.  The logstash script may
  need to do its own filtering since it won't be running a javascript
  interpreter, but it could probably still do so based on metadata
  encoded into the HTML by the pre-processor.  Old logs won't benefit
  from new features in the pre-processor though (unless we really feel
  like batch-reprocessing).

I think the choices of 1c and 2b get us out of the business of running
log servers altogether and moves all the logic and processing to the
edges.  I'm leaning toward them for that reason.

-Jim

Open Stack

[OpenStack-Infra] Log storage/serving

OpenStack

Community

Documentation

Branding & Legal