Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Structured Metadata for Search & SEO #5208

Open
davidfischer opened this issue Jan 31, 2019 · 10 comments
Open

Structured Metadata for Search & SEO #5208

davidfischer opened this issue Jan 31, 2019 · 10 comments
Labels
Accepted Accepted issue on our roadmap Feature New feature Needed: design decision A core team decision is required

Comments

@davidfischer
Copy link
Contributor

davidfischer commented Jan 31, 2019

We could improve the SEO of Read the Docs by using structured metadata. Here's Google's documentation on the subject. Basically, this involves adding special tags (or JSON) to parts of our site that give a deeper understanding of our site.

For example, we could add the following to the output of the documentation for the Read the Docs Sphinx theme or to its project page:

<script type="application/ld+json">
{
  "@context": "http://schema.org",
  "@type": "SoftwareApplication",
  "name": "Read the Docs Sphinx Theme",
  "description": "The sphinx_rtd_theme is a sphinx theme designed to look modern and be mobile-friendly.",
  "keywords": "sphinx, python, readthedocs",
  "softwareVersion": "0.4.2",
  "softwareHelp": "https://sphinx-rtd-theme.readthedocs.io/en/latest/",
  "operatingSystem": "Windows, Mac, Linux",
  "applicationCategory": "DeveloperApplication",
  "inLanguage": "en",
  "license": "https://opensource.org/licenses/MIT",
  "datePublished": "2018-12-31",
  "url": "https://github.com/rtfd/sphinx_rtd_theme"
}
</script>

See the schema.org docs for "Software Application" for all possible attributes.

To give an example, GitHub itself uses these tags. For example, if you view source on the readthedocs.org page, you'll notice references to schema.org. These are structured metadata.

You can test this metadata in Google's tooling

@davidfischer davidfischer added this to the Search improvements milestone Jan 31, 2019
@davidfischer
Copy link
Contributor Author

Google also has docs specifically for marking up software apps

@stsewd
Copy link
Member

stsewd commented Jan 31, 2019

A time ago I was able to extract some similar information from projects, we can use the same code for this #1758 (comment). Probably all this fits better in the sphinx extension?

@agjohnson agjohnson added Needed: design decision A core team decision is required Accepted Accepted issue on our roadmap Feature New feature labels Feb 1, 2019
@agjohnson
Copy link
Contributor

This would be a great addition. I think we'd have to output context data from RTD, and pick that up in our sphinx extension. However, we might already have all the metadata and context data we need to do this available in sphinx already. I think the bulk of the work will be in the sphinx extension, injecting this into html output, regardless of theme.

@agjohnson
Copy link
Contributor

I came across this issue again today. A user had a question on how to accomplish setting the canonical version for SEO purposes, which is a great question. I also realized this applies to translations as well.

Google's guidance on translations is here:
https://developers.google.com/search/docs/specialty/international/localized-versions

This does feel like it should be a core RTD feature, given our focus is enabling multiple versions and translations. Perhaps given recent conversations, this should be implemented outside Sphinx though.

@agjohnson
Copy link
Contributor

agjohnson commented Nov 11, 2022

I gave a quick stab at this for our own docs, but it wasn't actually clear how to relate multiple versions of the same page together. As far as I can tell, this is not part of the SoftwareApplication schema type. There is a way to define translation relationships, but not for versions.

That's at least what I gather from schemaorg/schemaorg#1476

From schemaorg/schemaorg#975 (comment), it seems isPartOf could be used?

@humitos
Copy link
Member

humitos commented Nov 14, 2022

I do see different tasks here:

  • JSON metadata on web application (.org/.com): this can be done by statically adding this data in the base.html Django template
  • JSON metadata on documentation pages (.io): this could be done in a Sphinx extension that users can decide whether or not to install (similar to what we did with sphinx-notfound-page)
  • Canonical version: looks like an application feature similar to the canonical URL but including the "canonical version" on it as well instead of pointing to the root of the domain

@agjohnson
Copy link
Contributor

this could be done in a Sphinx extension that users can decide whether or not to install

This feels like more of a core feature, not something that should be optional or only supported in Sphinx. With the work we're describing around generalizing all of the Sphinx extensions we've authored, I'm not sure I'd start with a Sphinx extension for new feature tests when we have the option of making it an agnostic post-processing step instead.

JSON metadata on web application (.org/.com)

I wasn't considering this, what exactly is the use case you see here?

Canonical version

I think we're describing addressing documentation versioning SEO with schema metadata, not a separate feature. Google, in theory, uses this metadata for SEO purposes, though they don't say specifically what they do with multiple versions of the same documentation.

@humitos
Copy link
Member

humitos commented Nov 15, 2022

@agjohnson

This feels like more of a core feature, not something that should be optional or only supported in Sphinx. With the work we're describing around generalizing all of the Sphinx extensions we've authored, I'm not sure I'd start with a Sphinx extension for new feature tests when we have the option of making it an agnostic post-processing step instead.

We don't know all the information required to construct the JSON that David described. How are you considering gathering all this information?

@agjohnson
Copy link
Contributor

In that example, not all of the attributes are required. What I'm mostly interested in is building up the graph of documentation projects/versions/translations linking to each other. Right now, versions and translations might be considered duplicate content to Google, and this could be negatively affecting SEO for projects.

The (big?) hang up is that the current schema does not offer an explicit way to define the version relationships between pages/projects. This is where partOf attribute might be needed. The does does have a mechanism for linking project translations together however, and that could be a good place to start.

@humitos
Copy link
Member

humitos commented Nov 7, 2023

Do we know all this data when serving the page? If so, we can implement this feature in a simple and generic way via a CF worker and inject this HTML tag at the CDN.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Accepted Accepted issue on our roadmap Feature New feature Needed: design decision A core team decision is required
Projects
None yet
Development

No branches or pull requests

4 participants