[openstack-dev] [docs] sitemap automation suggestions

Petr Kovar pkovar at redhat.com
Mon Oct 16 17:20:34 UTC 2017


As for logging 301s, 302s and 404s and the scope, I don't think we are
interested in checking EOL content for those.

As we are about to approve https://review.openstack.org/#/c/507629/, we
also want everybody to understand broken links found in EOL content won't
be fixed, since no content updates to EOL content will be provided.

Cheers,
pk


On Thu, 5 Oct 2017 22:51:31 -0400 (EDT)
"megan at openstack.org" <megan at openstack.org> wrote:

> Hello all!
> 
> As you may be aware, sitemaps generation for docs.openstack.org is currently done via a manually triggered scrapy process. It currently also scrapes the entirety of docs.openstack.org, making processing slow. In order to improve the efficiency of this process, I would like to propose the following updates to the sitemap generation toolkit:
>     * keep track (in logs) of 301s, 302s, and 404s,
>     * automatic pull of supported releases,
>     * cron-managed automatic updates, and
>     * setup of Google Webmaster tools (https://www.google.com/webmasters/) 
>     * a few style cleanups
>     
> Beyond this, implementing more targeted crawling would improve the processing speed and scope massively. This is, however, a bit of a complicated matter, as it requires us to decide what, exactly, defines scope relevence, in order to limit the crawl domain.
> 
> These are, of course, only our precursory findings. and we would love to hear some feedback about alternate methods and possible tricky aspects of the suggested changes. What do you think? Let us know!
> 
> 
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



More information about the OpenStack-dev mailing list