[simplification] Making ask.openstack.org read-only

Pierre Riteau pierre at stackhpc.com
Thu Jan 28 14:58:08 UTC 2021


On Thu, 28 Jan 2021 at 12:44, Thierry Carrez <thierry at openstack.org> wrote:
> At that point ask.openstack.org content will be lost, unless we somehow
> make a static copy somewhere. The Internet archive has copies of
> ask.openstack.org but they do not seem to run very deep. If anyone has
> ideas on how we could preserve that content without spending too many
> cycles on it, please share :)

It might be possible without scraping the site. It appears you can
iterate on all pages indexing questions with:

https://ask.openstack.org/en/questions/scope:all/sort:activity-desc/page:<NUMBER_FROM_1_TO_910>/

Each question can be retrieved directly by its ID without having to
know the last part of the URL. For example,
https://ask.openstack.org/en/question/24115/ redirects to
https://ask.openstack.org/en/question/24115/sample-data-of-objectringgz-containerringgz-and-accountringgz/

So, if an administrator could extract all question IDs from the
database, you could feed the question URLs and the index pages to the
Wayback Machine Save Page Now service, for example via this library:
https://github.com/pastpages/savepagenow

Although without a working search engine the usefulness of the archive
is limited.



More information about the openstack-discuss mailing list