[OpenStack-Infra] Log storage/serving

Joshua Hesketh joshua.hesketh at rackspace.com
Thu Sep 12 06:14:43 UTC 2013


Hey,

Great overview and plan James, thanks for that :-).

So it seems to me that we're duplicating the job of swift a little bit 
by writing a program to accept an object over http and store it on disk. 
If our end-game is logs stored in swift then why not make jenkins (and 
other workers) push the logs straight to swift?

This not only saves writing #2 and #6 but it also reduces the workload 
of managing log receiving servers. We would still likely need some kind 
of log serving point though*.

So as long as a worker can determine the end point of a log it can 
report it back to zuul as its URL. For example, if the worker knows that 
the serving application is at http://logs.openstack.org it might report 
back http://logs.openstack.org/obj/abc123.

We could then use either psuedo folders[0] or have the worker generate 
an index. For example, why not create an index object with links to the 
other objects (using the known serving application URL prepended)? In 
fact, the reporter can choose whether to generate an index file or just 
send the psuedo folder to be served up.

I like the idea of zuul handing out to workers swift keys for the 
destination but to me this seems optional. A worker can still upload 
their logs elsewhere (swift or otherwise) and send back a link using the 
swift serving application or not. So long as the link provides some kind 
of index or complete log it shouldn't matter. This also solves the 
matter of post-processing as it becomes a workers decision.

*For example, depending on the size of the logs (and therefore the 
job/worker) we could actually use javascript to serve up the swift 
object with CORS further reducing the infrastructure requirement and 
utilising the powerful CPU's and javascript engines we all have. I 
already started doing that as a crude way to serve and add formatting to 
my logs[1]. Basically javascript grabs a file and runs some regex for 
highlighting. So my worker only reports 
http://laughing-spice/logviewer/?q=http://worker/job/index.html where 
the index.html contains links to logs like 
http://laughing-spice/logviewer/?q=http://worker/job/mysqllog.txt etc.

In terms of the cutover/downtime, I'm not sure where that would come in? 
If we are still just reporting URL's back to zuul we can change the 
different workers over one at a time. Eventually all new reports will 
have a different URL then all we have to do is worry about archiving the 
old logs. For these I don't imagine it to be difficult to place in swift 
and set up some kind of application to redirect with 301's.

Cheers,
Josh

[0] 
http://docs.openstack.org/trunk/openstack-object-storage/developer/content/pseudo-hierarchical-folders-directories.html
[1] https://github.com/rcbau/laughing-spice

--
Rackspace Australia

On 9/11/13 7:54 AM, jeblair at openstack.org wrote:
> Hi,
>
> We've had a few conversations recently in various fora about log
> storage, so I thought it'd be a good idea to write down some ideas.
>
> The current state is that Jenkins uses SCP to copy files to
> static.openstack.org, which has an Apache vhost for logs.openstack.org.
> There's a really big filesystem, and we use Apache mod_autoindex to
> automatically serve directory indexes.  The destination log paths are
> calculated in advance by Zuul (actually in a custom parameter function
> defined by our configuration -- Zuul itself knows nothing about this),
> they are passed to Jenkins as a parameter, and the same paths are used
> to build the URL left in the review text in Gerrit.
>
> This causes us to need to maintain a very large filesystem (we use
> Cinder volumes with LVM, so it's not so bad), but it's still not very
> cloudy, and does require occasional manual work.  Swift is an obvious
> candidate for storing this sort of thing.
>
> The reason it was built this way instead of using swift is simply time:
> SCP and mod_autoindex already existed.  Swift (at least the two
> implementations we have access to) are not great at calculating and
> serving indexes to information -- so _something_ needs to be written in
> order to use Swift (either index pages for log files we generate, or an
> application that stores logs in swift and retrieves them and serves them
> over the web).
>
> I like the approach of having an application store and retrieve log
> data.  It would accomplish a number of goals:
>
> * By using something other than SCP, we can reduce the access needed by
>    the worker.  Currently Jenkins can write to anywhere in the log
>    filesystem, and we just count on the integrity of the Jenkins master
>    to prevent abuse of that privilege.
>
> * A log-receiving mechanism with tighter access controls means that we
>    could use a different kind of worker (something without the
>    master/slave separation that Jenkins has) so that the job itself could
>    upload its own logs.
>
> * A log-receiver could pre-process logs (compression, highlighting,
>    shipping to logstash, etc).
>
> * The log-receiving and log-serving application(s) would be horizontally
>    scalable (static.o.o has been and could again be a bottleneck).
>
> * The log-serving application could also do any processing before
>    serving.
>
> * Finally, all of this is actually fairly generalizable to artifact
>    processing, such as tarballs, so we should probably switch to calling
>    it artifact storage and retrieval.
>
> Sean Dague recently wrote a mod_python script that turns some OpenStack
> log files into HTML with syntax highlighting and links:
>
>    http://git.openstack.org/cgit/openstack-infra/config/tree/modules/openstack_project/files/logs/htmlify-screen-log.py
>
> This seems like it could be a good starting point, as it actually
> addresses one of the points in the above list.
>
> Here's how I think we could get from where we are to where we want to be:
>
> 1) Have Zuul generate a token (suggestion: HMAC signature using a shared
> secret) that can later be used to determine what kind of artifacts a job
> should be permitted to store, and where they can be stored.  Eg, a token
> might say that this run of gate-tempest can store artifacts to the logs
> container at '.../gate-tempest/1234' for the next 6 hours.  Another job
> might get a token (or multiple tokens) that say it can store logs as
> well as a tarball.
>
> This way even a completely untrusted worker can store artifacts because
> the token (which is effectively public) is scoped to only what the job
> needs.  This could be done entirely with a custom parameter function
> (just as the log paths are currently calculated) without any changes to
> Zuul itself, or we could extend Zuul to natively support this concept.
>
> 2) Write a program (or extend the mod_python script) that accepts
> artifacts over HTTP with a token.  It would then write them to the
> filesystem as we do now.  It can offline-validate the token with the
> shared secret (between it and Zuul).  It could also invalidate the token
> after its use.
>
> 3) Write a script that we can invoke from within our Jenkins jobs to use
> the token to upload artifacts.  Other non-Jenkins workers can use the
> same protocol to upload their artifacts.
>
> 4) Write a program (or extend the mod_python script) that accepts
> requests (using the same URL format) and reads the files from the
> filesystem and serves them.
>
> 5) Extend the artifact serving program in #3 so that it first checks a
> mysql database (we can use trove to provide the db) for each request; if
> it finds the item, then it serves it from swift.  If it is for a
> directory instead of a file, it uses the database to calculate the index
> and generates an index page and serves it.  If the item is not found in
> the DB, it fetches it from the disk.  If it's a directory that isn't in
> the DB, it generates an index based on the filesystem directory
> contents.
>
> 6) Extend the artifact storing program in #2 to optionally store the
> artifacts in swift instead of the filesystem.
>
> I think that approach gives us a reasonably secure system, and the
> stepwise nature means that we can test each component in turn, and
> provide a smooth transition.
>
> Some variants to consider:
>
>    * The token system doesn't have to be HMAC-based; there's lots of
>      stuff out there.  We could do online validation with Zuul instead of
>      a shared secret, for instance.
>
>    * Not trying to do the phased implementation, and just doing a cutover
>      with downtime (and bulk import old data).
>
>    * Also, it would be nice to make pre and post processing easily
>      pluggable and configurable early on; there's no telling what we may
>      want to do in the future.
>
> I think that about encompasses the ideas and conversations I've had
> around the subject.  Any thoughts?
>
> -Jim
>
> _______________________________________________
> OpenStack-Infra mailing list
> OpenStack-Infra at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra




More information about the OpenStack-Infra mailing list