On Thu, 28 Jan 2021 at 12:44, Thierry Carrez <thierry@openstack.org> wrote:
At that point ask.openstack.org content will be lost, unless we somehow make a static copy somewhere. The Internet archive has copies of ask.openstack.org but they do not seem to run very deep. If anyone has ideas on how we could preserve that content without spending too many cycles on it, please share :)
It might be possible without scraping the site. It appears you can iterate on all pages indexing questions with: https://ask.openstack.org/en/questions/scope:all/sort:activity-desc/page:<NUMBER_FROM_1_TO_910>/ Each question can be retrieved directly by its ID without having to know the last part of the URL. For example, https://ask.openstack.org/en/question/24115/ redirects to https://ask.openstack.org/en/question/24115/sample-data-of-objectringgz-cont... So, if an administrator could extract all question IDs from the database, you could feed the question URLs and the index pages to the Wayback Machine Save Page Now service, for example via this library: https://github.com/pastpages/savepagenow Although without a working search engine the usefulness of the archive is limited.