[OpenStack-Infra] Log storage/serving
Monty Taylor
mordred at inaugust.com
Thu Oct 10 18:26:53 UTC 2013
On 10/10/2013 02:06 PM, Clark Boylan wrote:
> On Thu, Oct 10, 2013 at 10:42 AM, James E. Blair <jeblair at openstack.org> wrote:
>> Joshua Hesketh <joshua.hesketh at rackspace.com> writes:
>>
>>> On 9/25/13 2:47 AM, James E. Blair wrote:
>>>> Joshua Hesketh <joshua.hesketh at rackspace.com> writes:
>>>>
>>>>> On 9/17/13 11:00 PM, Monty Taylor wrote:
>>>>>> On 09/16/2013 07:22 PM, Joshua Hesketh wrote:
>>>>>>> So if zuul dictates where a log goes and we place the objects in swift
>>>>>>> with that path (change / patchset / pipeline / job / run) then zuul
>>>>>>> could also handle placing indexes as it should know which objects to
>>>>>>> expect.
>>>>>>>
>>>>>>> That said, if the path is deterministic (such as that) and the workers
>>>>>>> provide the index for a run then I'm not sure how useful an index for
>>>>>>> patchsets would be. I'd be interested to know if anybody uses the link
>>>>>>> http://logs.openstack.org/34/45334/ without having come from gerrit or
>>>>>>> another source where it is published.
>>>>>> https://pypi.python.org/pypi/git-os-job
>>>>> Right, but that calculates the path (as far as I can see) so we
>>>>> therefore still don't necessarily need indexes generated.
>>>> The final portion of the URL, signifying the run, is effectively random.
>>>> So that tool actually relies on a one-level-up index page. (That tool
>>>> works on post jobs rather than check or gate, but the issues are
>>>> similar).
>>>
>>> So two questions,
>>> 1) Do we need a random job run? is it for debugging or something? And
>>> if so, can we provide it another way.
>>
>> I don't understand this question -- are you asking "does anyone need to
>> access a run other than the one left as a comment in gerrit?" That's
>> answered in my text you quoted below.
>>
>>> 2) What if the tool provided the index for its runs?
>>
>> I think we agree that would be fairly easy, at least starting from the
>> point of the individual run and working down the tree. I think it's the
>> indexes of runs that complicates this.
>>
>>>>
>>>> Other than that, most end users do not use indexes outside of the
>>>> particular job run, and that's by design. We try to put the most useful
>>>> URL in the message that is left in Gerrit.
>>>>
>>>> However, those of us working on the infrastructure itself, or those
>>>> engaged in special projects (such as mining old test logs), or even the
>>>> occasional person curious about whether the problem they are seeing was
>>>> encountered in _all_ runs of a test find the ability to locate logs from
>>>> any run _very_ useful. If we lost that ability, we would literally have
>>>> no way to locate any logs other than the 'final' logs of a run, and
>>>> those only through the comment left in Gerrit, due to the issue
>>>> mentioned above.
>>>>
>>>> We can discuss doing that, but it would be a huge change from our
>>>> current practice.
>>> Yep, I'm convinced that the logs need to be accessible.
>>
>> Okay, let me try to summarize current thinking:
>>
>> * We want to try to avoid writing a tool that receives logs because
>> swift provides most/all of the needed functionality.
>> * The swift tempurl middleware will allow us to have the client
>> directly PUT files in swift using a HMAC signed token.
>> * This means any pre-processing of logs would need to happen with the
>> log-uploading-client or via some unspecified event trigger.
>>
>> * We may or may not want a log-serving app.
>> * We're doing neat things like filtering on level and html-ifying logs
>> as we serve them with our current log-serving app.
>> * We could probably do that processing pre-upload (including embedding
>> javascript in html pages to do the visual filtering) and then we
>> could serve static pages instead.
>> * A log serving app may be required to provide some kinds of indexes.
>>
>> So to decide on the log-serving app, we need to figure out:
>>
>> 1) What do we want out of indexes?
>>
>> Let's take a current example log path:
>>
>> http://logs.openstack.org/95/50795/4/check/check-grenade-devstack-vm/3c17e3c/console.html
>>
>> Ignoring the change[:-2] at the beginning since it's an implementation
>> artifact, that's basically:
>>
>> /change/patchset/pipeline/job/run[random]/
>>
>> The upload script can easily handle creating index pages below that
>> point. But since it runs in the context of a job run, it can't create
>> index pages above that (besides the technical difficulty, we don't want
>> to give it permission outside of its run anyway). So I believe that
>> without a log-receiving app, our only options are:
>>
>> a) Use the static web swift middleware to provide indexes. Due to the
>> intersection of this feature, CDN, and container sizes with our
>> current providers, this is complicated and we end up at a dead end
>> every time we talk through it.
>>
>> b) Use a log-serving application to generate index pages where we need
>> them. We could do this by querying swift. If we eliminate the
>> ability to list ridiculously large indexes (like all changes, etc) and
>> restrict it down to the level of, say, a single change, then this
>> might be manageable. However, swift may still have to perform a large
>> query to get us down to that level.
>>
>> c) Reduce the discoverability of test runs. We could actually just
>> collapse the whole path into a random string and leave that as a
>> comment in Gerrit. Users would effectively never be able to discover
>> any runs other than the final ones that are reported in Gerrit, and
>> even comparing runs for different patchsets would involve looking up
>> the URL for each in the respective Gerrit comments. Openstack-infra
>> tools, such as elastic-recheck, could still discover other runs by
>> watching for ZMQ or Gearman events.
>>
>> This would make little difference to most end-users as well as project
>> tooling, but it would make it a little harder to develop new project
>> tooling without access to that event stream.
>>
>> Honestly, option C is growing on me, but I'd like some more feedback on
>> that.
>>
>> 2) What do we want out of processing?
>>
>> Currently we HTMLify and filter logs by log level at run-time when
>> serving them. I think our choices are:
>>
>> a) Continue doing this -- this requires a log-serving app that will
>> fetch logs from swift, process them, and serve them.
>>
>> b) Pre-process logs before uploading them. HTMLify and add
>> client-side javascript line-level filtering. The logstash script may
>> need to do its own filtering since it won't be running a javascript
>> interpreter, but it could probably still do so based on metadata
>> encoded into the HTML by the pre-processor. Old logs won't benefit
>> from new features in the pre-processor though (unless we really feel
>> like batch-reprocessing).
>>
>> I think the choices of 1c and 2b get us out of the business of running
>> log servers altogether and moves all the logic and processing to the
>> edges. I'm leaning toward them for that reason.
>>
> I agree about 2b, we can push specialized filtering into the places
> that need it if necessary rather than having a catch all centralized
> system. I am not quite sold on 1c though. If there is ever a need to
> go back in time and find files from a particular time range and
> project we would have to rely on Gerrit comments as an index which
> seems less than ideal. Or we would have to do something with the tools
> swift provides. Swift does allow for the attachment of arbitrary meta
> data on objects, but doesn't appear to support an easy way for using
> that info as an index (or filter). The more I think about a pure swift
> solution the more I like it (someone else can deal with the hard
> problems), but I do think we need to consider recording some index
> that isn't Gerrit.
2b++
1c - I think I'm pretty sold on as well. However, on what clark is
saying, if we put the metadata for a run that _would_ be used for an
index into a swift object, then we could, if we ever wanted one, write
a log index serving app. It would be full of expensive operations - but
I think our need for it would be low. We could also write it later as
needed.
> If a pure swift solution isn't doable what about a simple transaction
> log that is recorded on disk or in a DB? We wouldn't need to expose
> this to everyone, but having a record that maps build info to log
> objects would be handy especially if parsing it doesn't require access
> to Gerrit comments or the Gerrit DB. (Though this may be of minimal
> value as Gerrit does provide a simple map for us).
>
> Clark
>
> _______________________________________________
> OpenStack-Infra mailing list
> OpenStack-Infra at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra
>
More information about the OpenStack-Infra
mailing list