[OpenStack-Infra] Log storage/serving

James E. Blair jeblair at openstack.org
Thu Sep 12 16:49:35 UTC 2013


Joshua Hesketh <joshua.hesketh at rackspace.com> writes:

> We could then use either psuedo folders[0] or have the worker generate
> an index. For example, why not create an index object with links to
> the other objects (using the known serving application URL prepended)?
> In fact, the reporter can choose whether to generate an index file or
> just send the psuedo folder to be served up.

This is one of the main reasons we don't use swift today.  Consider this
directory:

http://logs.openstack.org/34/45334/

It contains all of the runs of all of the jobs for all of the patchsets
for change 45334.  That's very useful for discoverability; the
alternative is to read the comments in gerrit and examine the links
one-by-one.  A full-depth example:

http://logs.openstack.org/34/45334/7/check/gate-zuul-python27/7c48ee3/

(That's change / patchset / pipeline / job / run.)

Each individual job is concerned with only the last component of that
hierarchy, and has no knowledge of what other related jobs may have run
before or will run after, so creating an index under those circumstances
is difficult.  Moreover, if you consider that in the new system, we
won't be able to trust a job with access to any pseudo-directory level
higher than its individual run, there is actually no way for it to
create any of the higher-level indexes.

If we want to maintain that level of discoverability, then I think we
need something outside of the job to create indexes (in my earlier
email, the artifact-serving component does this).  If we are okay losing
that, then yes, we can just sort of shove everything related to a run
into a certain arbitrary location whose path won't be important anymore.
Within the area written to by a single run, however, we may still have
subdirectories.  Whether and how to create swift directory markers for
those is still an issue (see my other email).  But perhaps they are not
necessary, and yes, certainly _within the directory for a run_, we could
create index files for as needed.

Note the following implementation quirks we have observed:

 * Rackspace does not perform autoindex-like functionality for directory
   markers unless you are using the CDN (which has its own complications
   related to cache timeouts, dns hostnames, etc).

 * HPCloud does not recognize directory markers when generating index
   pages for the public view of containers.

We may want and indeed be able to use the staticweb feature, along with
the CDN -- but there's enough complication here that we'll need to get
fairly detailed in the design and validate our assumptions.

-Jim



More information about the OpenStack-Infra mailing list