<div dir="ltr"><div>Hey all,</div><div><br></div><div>I like this plan as a kind of next steps for OpenStack-Infra. I have some thoughts on how zuul might better improve it's logging story but will post those on the other thread.</div><div><br></div><div>I do, however, share both of Clark's concerns.</div><div><br></div><div>At the moment zuul_swift_uploads makes a POST request for each individual file. I do believe we can group them up to a limit, but that limit is still small and complicated by things such as the total size of the data (which is probably why the script does them individually but I don't recall). This is just to say that we need to test how it will go uploading a lot of files and the time that it may take.</div><div><br></div><div>I know the CDN was complicated with the cloud provider we were using at the time. However, I'm unsure what the CDN options are these days. Will there be an API we can use to turn the CDN on per container and get the public URL for example?</div><div><br></div><div>If the above two items turn out sub-optimal, I'm personally not opposed to continuing to run our own middleware. We don't necessarily need that to be in os_loganalyze as the returned URL could be a new middleware. The middleware can then handle the ARA and possibly even work as our own CDN choosing the correct container as needed (if we can't get CDN details otherwise).</div><div><br></div><div>Cheers,<br></div><div>Josh<br></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Tue, Jul 17, 2018 at 8:27 AM, James E. Blair <span dir="ltr"><<a href="mailto:corvus@inaugust.com" target="_blank">corvus@inaugust.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi,<br>
<br>
As you may know, all of the logs from Zuul builds are currently uploaded<br>
to a single static fileserver with about 14TB of storage available in<br>
one large filesystem. This was easy to set up, but scales poorly, and<br>
we live in constant fear of filesystem corruption necessitating a<br>
lengthy outage for repair or loss of data (an event which happens, on<br>
average, once or twice a year and takes several days to resolve).<br>
<br>
Our most promising approaches to solving this involve moving log storage<br>
to swift. We (mostly Joshua) have done considerable work in the past<br>
but kept hitting blockers. I think the situation has changed enough<br>
that the issues we hit before won't be a problem now. I believe we can<br>
use this work as a foundation to, relatively quickly, move our log<br>
storage into swift. Once there, there's a number of possibilities to<br>
improve the experience around logs and artifacts in Zuul and general.<br>
<br>
This email is going to focus mostly on how OpenStack Infra can move our<br>
current log storage and hosting to swift. I will follow it up with an<br>
email to the zuul-discuss list about further work that we can do that's<br>
more generally applicable to all Zuul users.<br>
<br>
This email is the result of a number of previous discussions, especially<br>
with Monty, and many of the ideas here are his. It also draws very<br>
heavily on Joshua's previous work. Here's the general idea:<br>
<br>
Pre-generate any content for which we currently rely on middleware<br>
running on <a href="http://logs.openstack.org" rel="noreferrer" target="_blank">logs.openstack.org</a>. Then upload all of that to swift.<br>
Return a direct link to swift for serving the content.<br>
<br>
In more detail:<br>
<br>
In addition to using swift as the storage backend, we would also like to<br>
avoid running a server as an intermediary. This is one of the obstacles<br>
we hit last time. We started to make os-loganalyze (OSLA) a smart proxy<br>
which could serve files from disk and swift. It threatened to become<br>
very complicated and tax the patience of OSLA's reviewers. OSLA's<br>
primary author and reviewer isn't really around anymore, so I suspect<br>
the appetite for major changes to OSLA is even less than it may have<br>
been in the past (we have merged 2 changes this year so far).<br>
<br>
There are three kinds of automatically generated content on logs.o.o:<br>
<br>
* Directory indexes<br>
* OSLA HTMLification of logs<br>
* ARA<br>
<br>
If we pre-generate all of those, we don't need any of the live services<br>
on logs.o.o. Joshua's zuul_swift_upload script already generates<br>
indexes for us. OSLA can already be used to HTMLify files statically.<br>
And ARA has a mode to pre-generate its output as well (which we used<br>
previously until we ran out of inodes. So today, we basically have what<br>
we need to pre-generate this data and store it in swift.<br>
<br>
Another issue we ran into previously was the transition from filesystem<br>
storage to swift. This was because in Zuul v2, we could not dynamically<br>
change the log reporting URL. However, in Zuul v3, since the job itself<br>
reports the final log URL, we can handle the transition by creating new<br>
roles to perform the swift upload and return the swift URL. We can<br>
begin by using these roles in a new base job so that we can verify<br>
correct operation. Then, when we're ready, we can switch the default<br>
base job. All jobs which upload logs to swift will report the new swift<br>
URL; the existing logs.o.o URLs will continue to function until they age<br>
out.<br>
<br>
The Zuul dashboard makes finding the location of logs for jobs<br>
(especially post jobs) simpler. So we no longer need logs.o.o to find<br>
the storage location (files or swift) for post jobs -- a user can just<br>
follow the link from the build history in the dashboard.<br>
<br>
Finally, the apache config (and to some degree, OSLA middleware) handles<br>
compression. Ultimately, we don't actually care if the files are<br>
compressed in storage. That's an implementation detail (which we care<br>
about now because we operate the storage). But it's not a user<br>
requirement. In fact, what we want is for jobs to produce logs in<br>
whatever format they want (plain text, journal, etc). We want to store<br>
those. And we want to present them to the user in the original format.<br>
Right now we compress many of them before we upload them to the log<br>
server because, lacking a dedicated upload handler on the log server,<br>
there's no other way to cause them to be stored compressed.<br>
<br>
If we're relieved of that burden, then the only thing we really care<br>
about is transfer efficiency. We should be able to upload files to<br>
swift with Content-Encoding: gzip, and, likewise, users should be able<br>
to download files with Accept-Encoding: gzip. We should be able to have<br>
efficient transfer without having to explicitly compress and rename<br>
files. Our first usability win.<br>
<br>
The latest version of the zuul_swift_upload script uses the swift<br>
tempurl functionality to upload logs. This is because it was designed<br>
to run on untrusted nodes. A closer analog to our current Zuul v3 log<br>
upload system would be to run the uploader on the executor, giving it a<br>
real swift credential. It can then upload logs to swift in the normal<br>
manner, rather than via tempurl. It can also create containers as<br>
needed -- another consideration from our earlier work. By default, it<br>
could avoid creating containers, but we could configure it to create,<br>
say, containers for each first-level of our sharding scheme. This could<br>
be a general feature of the role that allows for per-site customization.<br>
<br>
I think that's the approach we should start with, because it will be the<br>
easiest transition from our current scheme. However, in the future, we<br>
can move to having the uploads occur from the test nodes themselves<br>
(rather than, or in addition to, the executor), by having a two-part<br>
system. The first part runs on the executor in a trusted context and<br>
creates any containers needed, then generates a tempurl, and uses that<br>
to have the worker nodes upload to the container directly. I only<br>
mention this to show that we're not backing ourselves permanently into<br>
executor-only uploads. But we shouldn't consider this part of the first<br>
phase.<br>
<br>
We have also discussed using multiple swifts. It may be easiest to<br>
start with one, but in a future where we have executor affinity in Zuul,<br>
we may want to upload to the nearest swift. In that case, we can modify<br>
the role to, rather than be configured with a single swift, support<br>
multiple swifts, and use the executor affinity information to determine<br>
if there is a swift colocated in the executor's cloud, and if not, use a<br>
fallback. This way we can use multiple swifts as they are available,<br>
but not require them.<br>
<br>
To summarize: static generation combined with a new role to upload to<br>
swift using openstacksdk should allow us to migrate to swift fairly<br>
quickly. Once there, we can work on a number of enhancements which I<br>
will describe in a followup post to zuul-discuss.<br>
<br>
-Jim<br>
<br>
______________________________<wbr>_________________<br>
OpenStack-Infra mailing list<br>
<a href="mailto:OpenStack-Infra@lists.openstack.org">OpenStack-Infra@lists.<wbr>openstack.org</a><br>
<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra" rel="noreferrer" target="_blank">http://lists.openstack.org/<wbr>cgi-bin/mailman/listinfo/<wbr>openstack-infra</a></blockquote></div><br></div>