Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Algolia search hits contain unwanted #docusaurus_skipToContent_fallback anchor #8883

Closed
1 of 2 tasks
jnikhila opened this issue Apr 12, 2023 · 22 comments · Fixed by #8909
Closed
1 of 2 tasks

Algolia search hits contain unwanted #docusaurus_skipToContent_fallback anchor #8883

jnikhila opened this issue Apr 12, 2023 · 22 comments · Fixed by #8909
Labels
bug An error in the Docusaurus core causing instability or issues with its execution domain: search Related to the search feature, usually Algolia

Comments

@jnikhila
Copy link

Have you read the Contributing Guidelines on issues?

Description

The purpose of #docusaurus_skipToContent_fallback in Docusaurus search results is not currently documented on the Docusaurus website, which can lead to confusion for some users. While it is assumed that the behavior is intended to improve accessibility for users who rely on assistive technology, there is no clear explanation of this feature's functionality.

To address this issue, I recommend adding a mention of #docusaurus_skipToContent_fallback to the Docusaurus documentation.

Self-service

  • I'd be willing to address this documentation request myself.
@jnikhila jnikhila added documentation The issue is related to the documentation of Docusaurus status: needs triage This issue has not been triaged by maintainers labels Apr 12, 2023
@Josh-Cena
Copy link
Collaborator

👋 Anything not documented is considered implementation detail, which is the case here. Why would you specifically want documentation for this?

@Josh-Cena Josh-Cena added closed: wontfix A fix will bring significant overhead, or is out of scope (for feature requests) and removed status: needs triage This issue has not been triaged by maintainers labels Apr 12, 2023
@jnikhila
Copy link
Author

@Josh-Cena - Thank you for your prompt response. While I understand that undocumented features are considered implementation details, having a non-functional anchor in the documentation can be confusing for users, especially when they use the search bar to find relevant information.

Our internal teams also rely on the search functionality to share relevant documentation with end users, and having inaccurate or non-functional anchors can hinder this process. We also have a request raised by our internal team to remove the anchor. However, I wanted to ensure that this would not have any negative repercussions before proceeding. As I'm unable to find any relevant information on Docusaurus documentation, I thought it would be best to consult with you. I would greatly appreciate your help in this matter.

@slorber
Copy link
Collaborator

slorber commented Apr 13, 2023

@jnikhila I don't understand what you mean by

having inaccurate or non-functional anchors can hinder this process.

What's the behavior that you experience exactly? Show me a very concrete example, record a video or something demonstrating the problem.

We also have a request raised by our internal team to remove the anchor. However, I wanted to ensure that this would not have any negative repercussions before proceeding.

You can do whatever you want, including implementing your own theme from scratch, but you are on your own then. You become responsible from your customizations that diverge from our official code, and we cannot support you doing so.


As I'm unable to find any relevant information on Docusaurus documentation, I thought it would be best to consult with you.

We tried to make it work automatically out of the box for official plugins but also third-party ones.

The skip to content button will navigate in priority to the first <main> tag, and otherwise will fallback to an element (the Docusaurus layout content wrapper).

/**
 * The id of the element that should become focused on a page
 * that does not have a <main> html tag.
 * Focusing the Docusaurus Layout children is a reasonable fallback.
 */
export const SkipToContentFallbackId = 'docusaurus_skipToContent_fallback';

/**
 * Returns the skip to content element to focus when the link is clicked.
 */
function getSkipToContentTarget(): HTMLElement | null {
  return (
    // Try to focus the <main> in priority
    // Note: this will only work if JS is enabled
    // See https://github.com/facebook/docusaurus/issues/6411#issuecomment-1284136069
    document.querySelector('main:first-of-type') ??
    // Then try to focus the fallback element (usually the Layout children)
    document.getElementById(SkipToContentFallbackId)
  );
}

Going to close because it's more a question about implementation details rather than something we should fix or document.

If you want to know more about implementation details and customize behaviors, you should read the source code.

@slorber slorber closed this as not planned Won't fix, can't repro, duplicate, stale Apr 13, 2023
@jnikhila
Copy link
Author

@slorber - Thank you for your response. I want to clarify that we have not made any customizations to the default behavior of Docusaurus. We are using v 2.4.0, using Algolia DocSearch for search capabilities. Despite this, we have noticed that an anchor tag (#docusaurus_skipToContent_fallback) gets appended to the URL whenever a user selects a search result and navigates to that page.

sample URL - https://sample-docs.com/docs/getting-started/installation**#docusaurus_skipToContent_fallback**

I searched Docusaurus documentation and have not found any information about this behavior. That's when I raised a query request to understand if this was missing from the documentation and if there were any possible ways to remove the anchor tag getting appended to the URL.

Just to clarify, I understand that the ticket was closed because it's a question. However, I appreciate your response and the information provided. I do not want to reopen the ticket and would like to keep it closed. Thank you for your help.

@slorber
Copy link
Collaborator

slorber commented Apr 14, 2023

@jnikhila sample-docs.com is not a real domain so I can't double check the behavior, and can't reproduce either on our own website at https://docusaurus.io/

Give me a real production URL, record a video of the problem, a repro, or anything else that is "concrete". Not just text, something that I can run.

Otherwise, it's not actionable from my side.

@jnikhila
Copy link
Author

@slorber - Thank you for your response.

The issue can be seen on our website.

Not sure if this background detail could help you:
Before upgrading to Docusaurus v2.4.0, we did not encounter the issue. We're on version 2.1.0 and upgraded to use the queryString attribute for Tabs and that's when we started to see this happening for search results.

@slorber
Copy link
Collaborator

slorber commented Apr 14, 2023

@jnikhila

I want to clarify that we have not made any customizations to the default behavior of Docusaurus.

Your site clearly shows customizations. Maybe revert these customizations until you find the reason. Docusaurus 2.4 does not have this bug by default as far as I know.

If you have swizzled components, this is likely the cause.

@slorber
Copy link
Collaborator

slorber commented Apr 14, 2023

Thanks, as your site is open-source (https://github.com/appsmithorg/appsmith-docs/blob/main/website) I was able to inspect and wasn't able to find customizations that could cause such a problem.

It turns out that it is indeed a bug, that I can reproduce on our own website in some edge cases:

CleanShot 2023-04-14 at 15 14 48@2x

The Algola search hit already contains the id:

CleanShot 2023-04-14 at 15 16 53@2x

I am still unable to understand why this happens to certain search hits and not all of them. Among results not linking to a specific heading, we can find some hits with/without that unwanted anchor.

In our case, this seems to happen on Markdown pages, like this one: https://docusaurus.io/examples/markdownPageExample#docusaurus_skipToContent_fallback

We probably need to upgrade our Algolia index configuration to ensure the anchor is never added to such search hits.

@shortcuts any idea what is happening here?

@slorber slorber reopened this Apr 14, 2023
@slorber slorber added bug An error in the Docusaurus core causing instability or issues with its execution documentation The issue is related to the documentation of Docusaurus and removed closed: wontfix A fix will bring significant overhead, or is out of scope (for feature requests) documentation The issue is related to the documentation of Docusaurus labels Apr 14, 2023
@slorber slorber changed the title Clarify the behavior of #docusaurus_skipToContent_fallback when clicked on search results Algolia search hits contain unwanted #docusaurus_skipToContent_fallback anchor Apr 14, 2023
@slorber
Copy link
Collaborator

slorber commented Apr 14, 2023

@jnikhila it's worth posting your Algolia crawler config here too.

Maybe it's not up to date with the one we recommend? https://docusaurus.io/docs/search#algolia-index-configuration

I feel like the behavior is different on your website compared to the Docusaurus website.

@slorber slorber removed the documentation The issue is related to the documentation of Docusaurus label Apr 14, 2023
@Josh-Cena Josh-Cena added the domain: search Related to the search feature, usually Algolia label Apr 14, 2023
@jnikhila
Copy link
Author

@slorber - We are not using the Algolia Crawler for updating our index.

We are still using the DocSearch Scrapper for updating it. We created the configuration file using the configuration available at the Algolia Github repo for DocSearch. Here's a link to the configuration file on our side.

We had earlier amended the config to add additional info to be indexed for searches like error messages. I realized we had removed the "url_without_anchor" attribute from the "attributesToRetrieve" array. I did a quick fix and verified it. Unfortunately, that has not fixed the issue.

As you pointed out, the issue only occurs for some of the Urls related to markdown files at your end. It makes sense that it is happening for all our doc Urls, as we use markdown files for every doc.

@shortcuts
Copy link
Contributor

Hey @slorber, @jnikhila!

The anchor attached to the hit is not related to the configuration but how we extract the data with the crawler (and scrapper) directly. Just to give you a bit of context, we try to get the closest anchor by finding a parent (or a sibling if no parent are found) with an anchor so we can leverage it in the search modal.


Here my guess is that we don't find any relevant anchor, and end up on the skipToContent one.

Should the crawler (and scraper) have found a more relevant anchor? If so, then it's a bug on our side, otherwise we can just tell our crawler to ignore them for Docusaurus websites if you want.

@shortcuts
Copy link
Contributor

shortcuts commented Apr 18, 2023

@slorber - We are not using the Algolia Crawler for updating our index.

as you may know, the DocSearch scraper have been archived, so it's normal that it differs from the Docusaurus results as they don't leverage the same crawler.

you can however stil create a fork and update the anchor guessing strategy, which is available here: https://github.com/algolia/docsearch-scraper/blob/master/scraper/src/strategies/anchor.py#L16

@slorber
Copy link
Collaborator

slorber commented Apr 19, 2023

Thanks @shortcuts !

Should the crawler (and scraper) have found a more relevant anchor? If so, then it's a bug on our side, otherwise we can just tell our crawler to ignore them for Docusaurus websites if you want.

What I don't understand in our own website case is how the crawler decides or not to use an id.

No id (good):

CleanShot 2023-04-19 at 17 13 04@2x

Bad id

This is the only case I was able to find in our own website:

CleanShot 2023-04-19 at 17 32 22@2x


It's not clear to me why the last case Algolia uses an id and not in all the other cases, any idea?

Both pages have quite similar structures:

  • skip to content link
  • skip to content fallback anchor
  • <main> section, more or less nested

@slorber
Copy link
Collaborator

slorber commented Apr 19, 2023

Does it make sense that we prefix this fallback anchor with __ ?

Looks like a convention you are already using in the older crawler, wonder where it comes from.

@shortcuts
Copy link
Contributor

It's not clear to me why the last case Algolia uses an id and not in all the other cases, any idea?

Both pages have quite similar structures:

skip to content link
skip to content fallback anchor

section, more or less nested

We don't go all the way up in the tree, we stop at some point (I need to deep dive in it again to give you more details), but IIRC we stop searching after something 5 levels around the current element (arbitrary number of course) because otherwise we might end-up with non-relevant anchors

Does it make sense that we prefix this fallback anchor with __ ?

Looks like a convention you are already using in the older crawler, wonder where it comes from.

If you think there might be other anchors we don't want to index later on, it's prefix it indeed, otherwise we can just ignore it internally.

@slorber
Copy link
Collaborator

slorber commented Apr 20, 2023

If you think there might be other anchors we don't want to index later on, it's prefix it indeed, otherwise we can just ignore it internally.

There are some other anchors but only this one is IMHO likely to cause troubles.


Will add the __prefix to all of them anyway just in case.

As far as I understand you probably choose to ignore anchors with that prefix for search because many React apps are using an id at the root (including us) such as __docusaurus, __next, ___gatsby already.

Let's keep that convention then. Other search crawlers can also respect it if it's already in use.

@shortcuts
Copy link
Contributor

Just pushed a fix on our side, will deploy it tomorrow in prod!

@slorber
Copy link
Collaborator

slorber commented Apr 20, 2023

Thanks, also added the __ prefix, it's deploying to prod now. If you run the crawler in less than 2 hours it should already fix prod.

We'll never know if your fix of tomorrow also works 😅

@jnikhila
Copy link
Author

as you may know, the DocSearch scraper have been archived, so it's normal that it differs from the Docusaurus results as they don't leverage the same crawler.
you can however stil create a fork and update the anchor guessing strategy, which is available here: https://github.com/algolia/docsearch-scraper/blob/master/scraper/src/strategies/anchor.py#L16

@slorber and @shortcuts - Do we need to make changes to the scrapper at our end to fix this?

@shortcuts
Copy link
Contributor

@slorber deployed to prod, it should now properly ignore anchors starting with __

@jnikhila once the next version of Docusaurus is released it should be good on your side

@jnikhila
Copy link
Author

Thank you, @shortcuts and @slorber helping here.

@slorber
Copy link
Collaborator

slorber commented Apr 21, 2023

Thanks @shortcuts

I confirm after running the crawler on our site that the problem is fixed:

CleanShot 2023-04-21 at 11 26 19@2x

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug An error in the Docusaurus core causing instability or issues with its execution domain: search Related to the search feature, usually Algolia
Projects
None yet
4 participants