[openstack-dev] [docs] sitemap automation suggestions

megan at openstack.org megan at openstack.org
Fri Oct 6 02:51:31 UTC 2017


Hello all!

As you may be aware, sitemaps generation for docs.openstack.org is currently done via a manually triggered scrapy process. It currently also scrapes the entirety of docs.openstack.org, making processing slow. In order to improve the efficiency of this process, I would like to propose the following updates to the sitemap generation toolkit:
    * keep track (in logs) of 301s, 302s, and 404s,
    * automatic pull of supported releases,
    * cron-managed automatic updates, and
    * setup of Google Webmaster tools (https://www.google.com/webmasters/) 
    * a few style cleanups
    
Beyond this, implementing more targeted crawling would improve the processing speed and scope massively. This is, however, a bit of a complicated matter, as it requires us to decide what, exactly, defines scope relevence, in order to limit the crawl domain.

These are, of course, only our precursory findings. and we would love to hear some feedback about alternate methods and possible tricky aspects of the suggested changes. What do you think? Let us know!




More information about the OpenStack-dev mailing list