Algolia search hits contain unwanted #docusaurus_skipToContent_fallback anchor #8883

jnikhila · 2023-04-12T11:03:37Z

Have you read the Contributing Guidelines on issues?

I have read the Contributing Guidelines on issues.

Description

The purpose of #docusaurus_skipToContent_fallback in Docusaurus search results is not currently documented on the Docusaurus website, which can lead to confusion for some users. While it is assumed that the behavior is intended to improve accessibility for users who rely on assistive technology, there is no clear explanation of this feature's functionality.

To address this issue, I recommend adding a mention of #docusaurus_skipToContent_fallback to the Docusaurus documentation.

Self-service

I'd be willing to address this documentation request myself.

The text was updated successfully, but these errors were encountered:

Josh-Cena · 2023-04-12T15:27:50Z

👋 Anything not documented is considered implementation detail, which is the case here. Why would you specifically want documentation for this?

jnikhila · 2023-04-13T09:29:44Z

@Josh-Cena - Thank you for your prompt response. While I understand that undocumented features are considered implementation details, having a non-functional anchor in the documentation can be confusing for users, especially when they use the search bar to find relevant information.

Our internal teams also rely on the search functionality to share relevant documentation with end users, and having inaccurate or non-functional anchors can hinder this process. We also have a request raised by our internal team to remove the anchor. However, I wanted to ensure that this would not have any negative repercussions before proceeding. As I'm unable to find any relevant information on Docusaurus documentation, I thought it would be best to consult with you. I would greatly appreciate your help in this matter.

slorber · 2023-04-13T10:43:15Z

@jnikhila I don't understand what you mean by

having inaccurate or non-functional anchors can hinder this process.

What's the behavior that you experience exactly? Show me a very concrete example, record a video or something demonstrating the problem.

We also have a request raised by our internal team to remove the anchor. However, I wanted to ensure that this would not have any negative repercussions before proceeding.

You can do whatever you want, including implementing your own theme from scratch, but you are on your own then. You become responsible from your customizations that diverge from our official code, and we cannot support you doing so.

As I'm unable to find any relevant information on Docusaurus documentation, I thought it would be best to consult with you.

We tried to make it work automatically out of the box for official plugins but also third-party ones.

The skip to content button will navigate in priority to the first <main> tag, and otherwise will fallback to an element (the Docusaurus layout content wrapper).

/**
 * The id of the element that should become focused on a page
 * that does not have a <main> html tag.
 * Focusing the Docusaurus Layout children is a reasonable fallback.
 */
export const SkipToContentFallbackId = 'docusaurus_skipToContent_fallback';

/**
 * Returns the skip to content element to focus when the link is clicked.
 */
function getSkipToContentTarget(): HTMLElement | null {
  return (
    // Try to focus the <main> in priority
    // Note: this will only work if JS is enabled
    // See https://github.com/facebook/docusaurus/issues/6411#issuecomment-1284136069
    document.querySelector('main:first-of-type') ??
    // Then try to focus the fallback element (usually the Layout children)
    document.getElementById(SkipToContentFallbackId)
  );
}

Going to close because it's more a question about implementation details rather than something we should fix or document.

If you want to know more about implementation details and customize behaviors, you should read the source code.

jnikhila · 2023-04-13T17:27:53Z

@slorber - Thank you for your response. I want to clarify that we have not made any customizations to the default behavior of Docusaurus. We are using v 2.4.0, using Algolia DocSearch for search capabilities. Despite this, we have noticed that an anchor tag (#docusaurus_skipToContent_fallback) gets appended to the URL whenever a user selects a search result and navigates to that page.

sample URL - https://sample-docs.com/docs/getting-started/installation**#docusaurus_skipToContent_fallback**

I searched Docusaurus documentation and have not found any information about this behavior. That's when I raised a query request to understand if this was missing from the documentation and if there were any possible ways to remove the anchor tag getting appended to the URL.

Just to clarify, I understand that the ticket was closed because it's a question. However, I appreciate your response and the information provided. I do not want to reopen the ticket and would like to keep it closed. Thank you for your help.

slorber · 2023-04-14T08:51:35Z

@jnikhila sample-docs.com is not a real domain so I can't double check the behavior, and can't reproduce either on our own website at https://docusaurus.io/

Give me a real production URL, record a video of the problem, a repro, or anything else that is "concrete". Not just text, something that I can run.

Otherwise, it's not actionable from my side.

jnikhila · 2023-04-14T12:41:44Z

@slorber - Thank you for your response.

The issue can be seen on our website.

Not sure if this background detail could help you:
Before upgrading to Docusaurus v2.4.0, we did not encounter the issue. We're on version 2.1.0 and upgraded to use the queryString attribute for Tabs and that's when we started to see this happening for search results.

slorber · 2023-04-14T13:10:15Z

@jnikhila

I want to clarify that we have not made any customizations to the default behavior of Docusaurus.

Your site clearly shows customizations. Maybe revert these customizations until you find the reason. Docusaurus 2.4 does not have this bug by default as far as I know.

If you have swizzled components, this is likely the cause.

slorber · 2023-04-14T13:23:33Z

Thanks, as your site is open-source (https://github.com/appsmithorg/appsmith-docs/blob/main/website) I was able to inspect and wasn't able to find customizations that could cause such a problem.

It turns out that it is indeed a bug, that I can reproduce on our own website in some edge cases:

The Algola search hit already contains the id:

I am still unable to understand why this happens to certain search hits and not all of them. Among results not linking to a specific heading, we can find some hits with/without that unwanted anchor.

In our case, this seems to happen on Markdown pages, like this one: https://docusaurus.io/examples/markdownPageExample#docusaurus_skipToContent_fallback

We probably need to upgrade our Algolia index configuration to ensure the anchor is never added to such search hits.

@shortcuts any idea what is happening here?

slorber · 2023-04-14T13:31:33Z

@jnikhila it's worth posting your Algolia crawler config here too.

Maybe it's not up to date with the one we recommend? https://docusaurus.io/docs/search#algolia-index-configuration

I feel like the behavior is different on your website compared to the Docusaurus website.

jnikhila · 2023-04-18T09:21:32Z

@slorber - We are not using the Algolia Crawler for updating our index.

We are still using the DocSearch Scrapper for updating it. We created the configuration file using the configuration available at the Algolia Github repo for DocSearch. Here's a link to the configuration file on our side.

We had earlier amended the config to add additional info to be indexed for searches like error messages. I realized we had removed the "url_without_anchor" attribute from the "attributesToRetrieve" array. I did a quick fix and verified it. Unfortunately, that has not fixed the issue.

As you pointed out, the issue only occurs for some of the Urls related to markdown files at your end. It makes sense that it is happening for all our doc Urls, as we use markdown files for every doc.

shortcuts · 2023-04-18T12:55:57Z

Hey @slorber, @jnikhila!

The anchor attached to the hit is not related to the configuration but how we extract the data with the crawler (and scrapper) directly. Just to give you a bit of context, we try to get the closest anchor by finding a parent (or a sibling if no parent are found) with an anchor so we can leverage it in the search modal.

Here my guess is that we don't find any relevant anchor, and end up on the skipToContent one.

Should the crawler (and scraper) have found a more relevant anchor? If so, then it's a bug on our side, otherwise we can just tell our crawler to ignore them for Docusaurus websites if you want.

shortcuts · 2023-04-18T12:59:34Z

@slorber - We are not using the Algolia Crawler for updating our index.

as you may know, the DocSearch scraper have been archived, so it's normal that it differs from the Docusaurus results as they don't leverage the same crawler.

you can however stil create a fork and update the anchor guessing strategy, which is available here: https://github.com/algolia/docsearch-scraper/blob/master/scraper/src/strategies/anchor.py#L16

slorber · 2023-04-19T15:33:50Z

Thanks @shortcuts !

Should the crawler (and scraper) have found a more relevant anchor? If so, then it's a bug on our side, otherwise we can just tell our crawler to ignore them for Docusaurus websites if you want.

What I don't understand in our own website case is how the crawler decides or not to use an id.

No id (good):

Search: https://docusaurus.io/search?q=Happy-First-Birthday-Slash
First result (good): https://docusaurus.io/blog/2018/12/14/Happy-First-Birthday-Slash

Bad id

This is the only case I was able to find in our own website:

Search: https://docusaurus.io/search?q=do
First result (bad): https://docusaurus.io/examples/markdownPageExample#docusaurus_skipToContent_fallback

It's not clear to me why the last case Algolia uses an id and not in all the other cases, any idea?

Both pages have quite similar structures:

skip to content link
skip to content fallback anchor
<main> section, more or less nested

slorber · 2023-04-19T15:37:52Z

Does it make sense that we prefix this fallback anchor with __ ?

Looks like a convention you are already using in the older crawler, wonder where it comes from.

shortcuts · 2023-04-20T08:51:37Z

It's not clear to me why the last case Algolia uses an id and not in all the other cases, any idea?

Both pages have quite similar structures:

skip to content link
skip to content fallback anchor
section, more or less nested

We don't go all the way up in the tree, we stop at some point (I need to deep dive in it again to give you more details), but IIRC we stop searching after something 5 levels around the current element (arbitrary number of course) because otherwise we might end-up with non-relevant anchors

Does it make sense that we prefix this fallback anchor with __ ?

Looks like a convention you are already using in the older crawler, wonder where it comes from.

If you think there might be other anchors we don't want to index later on, it's prefix it indeed, otherwise we can just ignore it internally.

slorber · 2023-04-20T13:26:21Z

If you think there might be other anchors we don't want to index later on, it's prefix it indeed, otherwise we can just ignore it internally.

There are some other anchors but only this one is IMHO likely to cause troubles.

Will add the __prefix to all of them anyway just in case.

As far as I understand you probably choose to ignore anchors with that prefix for search because many React apps are using an id at the root (including us) such as __docusaurus, __next, ___gatsby already.

Let's keep that convention then. Other search crawlers can also respect it if it's already in use.

shortcuts · 2023-04-20T14:30:56Z

Just pushed a fix on our side, will deploy it tomorrow in prod!

slorber · 2023-04-20T14:43:08Z

Thanks, also added the __ prefix, it's deploying to prod now. If you run the crawler in less than 2 hours it should already fix prod.

We'll never know if your fix of tomorrow also works 😅

jnikhila · 2023-04-21T06:42:14Z

as you may know, the DocSearch scraper have been archived, so it's normal that it differs from the Docusaurus results as they don't leverage the same crawler.
you can however stil create a fork and update the anchor guessing strategy, which is available here: https://github.com/algolia/docsearch-scraper/blob/master/scraper/src/strategies/anchor.py#L16

@slorber and @shortcuts - Do we need to make changes to the scrapper at our end to fix this?

shortcuts · 2023-04-21T07:54:49Z

@slorber deployed to prod, it should now properly ignore anchors starting with __

@jnikhila once the next version of Docusaurus is released it should be good on your side

jnikhila · 2023-04-21T08:07:59Z

Thank you, @shortcuts and @slorber helping here.

slorber · 2023-04-21T09:26:50Z

Thanks @shortcuts

I confirm after running the crawler on our site that the problem is fixed:

jnikhila added documentation The issue is related to the documentation of Docusaurus status: needs triage This issue has not been triaged by maintainers labels Apr 12, 2023

Josh-Cena added closed: wontfix A fix will bring significant overhead, or is out of scope (for feature requests) and removed status: needs triage This issue has not been triaged by maintainers labels Apr 12, 2023

slorber closed this as not planned Won't fix, can't repro, duplicate, stale Apr 13, 2023

slorber reopened this Apr 14, 2023

slorber changed the title ~~Clarify the behavior of #docusaurus_skipToContent_fallback when clicked on search results~~ Algolia search hits contain unwanted #docusaurus_skipToContent_fallback anchor Apr 14, 2023

slorber removed the documentation The issue is related to the documentation of Docusaurus label Apr 14, 2023

Josh-Cena added the domain: search Related to the search feature, usually Algolia label Apr 14, 2023

slorber mentioned this issue Apr 20, 2023

fix(theme): add __ prefix to technical anchors, search crawlers (Algolia) should ignore them #8909

Merged

slorber closed this as completed in #8909 Apr 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Algolia search hits contain unwanted #docusaurus_skipToContent_fallback anchor #8883

Algolia search hits contain unwanted #docusaurus_skipToContent_fallback anchor #8883

jnikhila commented Apr 12, 2023

Josh-Cena commented Apr 12, 2023

jnikhila commented Apr 13, 2023

slorber commented Apr 13, 2023 •

edited

Loading

jnikhila commented Apr 13, 2023

slorber commented Apr 14, 2023

jnikhila commented Apr 14, 2023

slorber commented Apr 14, 2023

slorber commented Apr 14, 2023

slorber commented Apr 14, 2023

jnikhila commented Apr 18, 2023

shortcuts commented Apr 18, 2023

shortcuts commented Apr 18, 2023 •

edited

Loading

slorber commented Apr 19, 2023 •

edited

Loading

slorber commented Apr 19, 2023

shortcuts commented Apr 20, 2023

slorber commented Apr 20, 2023 •

edited

Loading

shortcuts commented Apr 20, 2023

slorber commented Apr 20, 2023

jnikhila commented Apr 21, 2023

shortcuts commented Apr 21, 2023

jnikhila commented Apr 21, 2023

slorber commented Apr 21, 2023

Algolia search hits contain unwanted #docusaurus_skipToContent_fallback anchor #8883

Algolia search hits contain unwanted #docusaurus_skipToContent_fallback anchor #8883

Comments

jnikhila commented Apr 12, 2023

Have you read the Contributing Guidelines on issues?

Description

Self-service

Josh-Cena commented Apr 12, 2023

jnikhila commented Apr 13, 2023

slorber commented Apr 13, 2023 • edited Loading

jnikhila commented Apr 13, 2023

slorber commented Apr 14, 2023

jnikhila commented Apr 14, 2023

slorber commented Apr 14, 2023

slorber commented Apr 14, 2023

slorber commented Apr 14, 2023

jnikhila commented Apr 18, 2023

shortcuts commented Apr 18, 2023

shortcuts commented Apr 18, 2023 • edited Loading

slorber commented Apr 19, 2023 • edited Loading

No id (good):

Bad id

slorber commented Apr 19, 2023

shortcuts commented Apr 20, 2023

slorber commented Apr 20, 2023 • edited Loading

shortcuts commented Apr 20, 2023

slorber commented Apr 20, 2023

jnikhila commented Apr 21, 2023

shortcuts commented Apr 21, 2023

jnikhila commented Apr 21, 2023

slorber commented Apr 21, 2023

slorber commented Apr 13, 2023 •

edited

Loading

shortcuts commented Apr 18, 2023 •

edited

Loading

slorber commented Apr 19, 2023 •

edited

Loading

slorber commented Apr 20, 2023 •

edited

Loading