[docs] Encourage search engines to show newest version of OpenStack docs first
Hi all, I am not sure if this is encountered by others, but often when I search for something regarding OpenStack, the first few results are of older versions. For example, my last search was "keystone config service_token_roles_required", which returns a Pike[1] URL as first result. For older versions, the documentation does have a header which is something like "This release is no longer supported by the community. The current supported release is 2023.2." It links to new documentation[2], but drops me to the docs root instead of the new version of this page and usually is not useful for me. (I understand it is not possible to drop at the same page as documentation moves around). I often try to change the version in the URL to 'latest' which works sometimes. I wonder if we can help to fix this? Looking at other people with this issue[3], a few strategies seems to have help. (Disclaimer: I am not a SEO expert). 1. Prevent Google from indexing the older versions by using `noindex`[4] 2. Removing old documentation so it is flushed from Google Regards, Jake [1] https://docs.openstack.org/keystone/pike/admin/identity-auth-token-middlewar... [2] https://docs.openstack.org/2023.2/ [3] https://github.com/crossplane/docs/issues/107 [4] https://developers.google.com/search/docs/crawling-indexing/block-indexing -- Jake Yip Technical Lead, Nectar Research Cloud
This has been discussed before, not for a while though. The last time I recall was the first Denver PTG. In the past we did remove docs as branches were EOLd which resulted in lots of 404s as a search engine would index a page and return it but we had removed it so the user got a 404. There was also a lengthy discussion about not removing docs that people are using. I haven't looked at the user survey results but I'm sure you've seen the long tail of people still using very old results. We could potentially remove the older releases from the index by doing something similar to the crossplane project did. That has the potential to solve most of the problems we've all seen but doesn't remove the docs for older releases in general. On Fri, Nov 3, 2023, 07:03 Jake Yip <jake.yip@ardc.edu.au> wrote:
Hi all,
I am not sure if this is encountered by others, but often when I search for something regarding OpenStack, the first few results are of older versions. For example, my last search was "keystone config service_token_roles_required", which returns a Pike[1] URL as first result.
For older versions, the documentation does have a header which is something like "This release is no longer supported by the community. The current supported release is 2023.2." It links to new documentation[2], but drops me to the docs root instead of the new version of this page and usually is not useful for me. (I understand it is not possible to drop at the same page as documentation moves around). I often try to change the version in the URL to 'latest' which works sometimes.
I wonder if we can help to fix this? Looking at other people with this issue[3], a few strategies seems to have help. (Disclaimer: I am not a SEO expert).
1. Prevent Google from indexing the older versions by using `noindex`[4] 2. Removing old documentation so it is flushed from Google
Regards, Jake
[1]
https://docs.openstack.org/keystone/pike/admin/identity-auth-token-middlewar... [2] https://docs.openstack.org/2023.2/ [3] https://github.com/crossplane/docs/issues/107 [4] https://developers.google.com/search/docs/crawling-indexing/block-indexing
-- Jake Yip Technical Lead, Nectar Research Cloud
On Mon, Nov 6, 2023, at 3:01 PM, Tony Breeds wrote:
This has been discussed before, not for a while though. The last time I recall was the first Denver PTG.
In the past we did remove docs as branches were EOLd which resulted in lots of 404s as a search engine would index a page and return it but we had removed it so the user got a 404.
There was also a lengthy discussion about not removing docs that people are using. I haven't looked at the user survey results but I'm sure you've seen the long tail of people still using very old results.
We could potentially remove the older releases from the index by doing something similar to the crossplane project did. That has the potential to solve most of the problems we've all seen but doesn't remove the docs for older releases in general.
Zuul set `<link rel=canonical>` on its latest docs [5] to get the search engines to prefer the latest docs while still hosting the older docs for people looking for that information. Perhaps hints like this could be used in the OpenStack documentation as well. Not sure what crossplane did, but maybe this is similar?
On Fri, Nov 3, 2023, 07:03 Jake Yip <jake.yip@ardc.edu.au> wrote:
Hi all,
I am not sure if this is encountered by others, but often when I search for something regarding OpenStack, the first few results are of older versions. For example, my last search was "keystone config service_token_roles_required", which returns a Pike[1] URL as first result.
For older versions, the documentation does have a header which is something like "This release is no longer supported by the community. The current supported release is 2023.2." It links to new documentation[2], but drops me to the docs root instead of the new version of this page and usually is not useful for me. (I understand it is not possible to drop at the same page as documentation moves around). I often try to change the version in the URL to 'latest' which works sometimes.
I wonder if we can help to fix this? Looking at other people with this issue[3], a few strategies seems to have help. (Disclaimer: I am not a SEO expert).
1. Prevent Google from indexing the older versions by using `noindex`[4] 2. Removing old documentation so it is flushed from Google
Regards, Jake
[1] https://docs.openstack.org/keystone/pike/admin/identity-auth-token-middlewar... [2] https://docs.openstack.org/2023.2/ [3] https://github.com/crossplane/docs/issues/107 [4] https://developers.google.com/search/docs/crawling-indexing/block-indexing
[5] https://review.opendev.org/c/zuul/zuul/+/825535
-- Jake Yip Technical Lead, Nectar Research Cloud
Hi Tony, Thanks for answering! On 7/11/2023 10:01 am, Tony Breeds wrote:
This has been discussed before, not for a while though. The last time I recall was the first Denver PTG.
In the past we did remove docs as branches were EOLd which resulted in lots of 404s as a search engine would index a page and return it but we had removed it so the user got a 404.
I think this is similar to what happened to Crossplane and they had to explicitly get Google to reindex those[1].
There was also a lengthy discussion about not removing docs that people are using. I haven't looked at the user survey results but I'm sure you've seen the long tail of people still using very old results.
In that case, removing those docs may be a step too far.
We could potentially remove the older releases from the index by doing something similar to the crossplane project did. That has the potential to solve most of the problems we've all seen but doesn't remove the docs for older releases in general.
I attempted to look into it a bit, but it's not my specialty. Also, eventually, it'll require someone to re-trigger indexing from Google via search console[2], which needs domain ownership verification (via TXT record, etc). [1] https://github.com/crossplane/docs/issues/107#issuecomment-990338800 [2] https://search.google.com/search-console/
We should also be able to prioritize the /latest paths using the sitemap and submit the updated sitemap to google and friends. Looking at the sitemap (openstack/openstack-manuals/www/static/sitemap.xml), it appears the priorities are incorrect. Octavia latest admin for example: <url> <loc>https://docs.openstack.org/octavia/latest/admin/</loc> <priority>0.5</priority> <changefreq>daily</changefreq> <lastmod>2020-06-21T14:55:29+0000</lastmod> </url> Octavia Pike admin: <url> <loc>https://docs.openstack.org/octavia/pike/admin/</loc> <priority>1.0</priority> <changefreq>weekly</changefreq> <lastmod>2019-10-05T14:32:32+0000</lastmod> </url> So, given the lastmod date, we haven't updated the sitemap since 2020 and we probably submitted it with stable branch docs having a higher priority (relative to other pages on our site) than the /latest paths. Michael On Tue, Nov 7, 2023 at 4:01 AM Jake Yip <jake.yip@ardc.edu.au> wrote:
Hi Tony,
Thanks for answering!
On 7/11/2023 10:01 am, Tony Breeds wrote:
This has been discussed before, not for a while though. The last time I recall was the first Denver PTG.
In the past we did remove docs as branches were EOLd which resulted in lots of 404s as a search engine would index a page and return it but we had removed it so the user got a 404.
I think this is similar to what happened to Crossplane and they had to explicitly get Google to reindex those[1].
There was also a lengthy discussion about not removing docs that people are using. I haven't looked at the user survey results but I'm sure you've seen the long tail of people still using very old results.
In that case, removing those docs may be a step too far.
We could potentially remove the older releases from the index by doing something similar to the crossplane project did. That has the potential to solve most of the problems we've all seen but doesn't remove the docs for older releases in general.
I attempted to look into it a bit, but it's not my specialty. Also, eventually, it'll require someone to re-trigger indexing from Google via search console[2], which needs domain ownership verification (via TXT record, etc).
[1] https://github.com/crossplane/docs/issues/107#issuecomment-990338800 [2] https://search.google.com/search-console/
participants (4)
-
Clark Boylan
-
Jake Yip
-
Michael Johnson
-
Tony Breeds