Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatically warn or convert unformatted URLs in the documentation site #4208

Closed
AetherUnbound opened this issue Apr 25, 2024 · 8 comments · Fixed by #4262
Closed

Automatically warn or convert unformatted URLs in the documentation site #4208

AetherUnbound opened this issue Apr 25, 2024 · 8 comments · Fixed by #4262
Assignees
Labels
🤖 aspect: dx Concerns developers' experience with the codebase 🧰 goal: internal improvement Improvement that benefits maintainers, not users help wanted Open to participation from the community 🟩 priority: low Low priority and doesn't need to be rushed 🧱 stack: documentation Related to Sphinx documentation 🐍 tech: python Involves Python

Comments

@AetherUnbound
Copy link
Contributor

Description

Context: #4173 and #4195

We issued a one-off change to update all links in the DAGs documentation page, but it's likely that we will encounter the issue of unclickable links in documentation in the future.

We should build out either a Sphinx plugin or a Vale linting rule to handle these cases, and either warn when a link is unformatted (e.g. http://{link} instead of the correct <http://{link}>) or automatically convert those links into the formatted version.

@sarayourfriend may be able to provide context on the Vale approach.

For the Sphinx approach, we could use something like our existing extensions which automatically link GitHub usernames or GitHub issues. We would have a similar extension which would rewrite unformatted links to the formatted version in the produced HTML files.

Alternatives

Leave this without a check and occasionally audit the project to ensure all links in the documentation are clickable.

@AetherUnbound AetherUnbound added help wanted Open to participation from the community 🐍 tech: python Involves Python 🟩 priority: low Low priority and doesn't need to be rushed 🤖 aspect: dx Concerns developers' experience with the codebase 🧰 goal: internal improvement Improvement that benefits maintainers, not users 🧱 stack: documentation Related to Sphinx documentation labels Apr 25, 2024
@sarayourfriend
Copy link
Contributor

Okay, I wrote a summary of how to accomplish this with Vale, but it occurred to me that it was weird MyST didn't already behave this way to begin with, and sure enough it has a syntax extension that does this for you: https://myst-parser.readthedocs.io/en/latest/syntax/optional.html#linkify

We just need to enable that extension 🙂

Here's what I originally wrote, just in case it is relevant for some reason, but unless the MyST extension doesn't do what we need, this bit shouldn't matter.

The Sphinx approach is definitely doable, but will incur an cost in our docs build time every time, so I'd rather try other approaches. I won't cover how I'd approach that because (a) it's almost certainly complicated and (b) for me to describe it would require sitting down and probably just doing it a.k.a., exploratory coding. If we do it with Sphinx, I don't think this is a good issue for someone who hasn't already worked on a Sphinx plugin before. Vale is a strong preference for me, it should be much simpler to do it there, if it is possible to do.

For Vale, first check out the documentation on writing Vale styles and the different possibilities: https://vale.sh/docs/topics/styles/

It should be possible to use a regex to find instances of raw links that are not wrapped in angle brackets, something like: \shttps?://.[^>]?(\s|\.). The idea being: match the scheme with whitespace in front and terminating in either whitespace or a period, and allowing anything before the period/space other than > to avoid >. and > matching; almost certainly doesn't work exactly, but it is where I would start.

I'm guessing that the "conditional" or "existence" style forms are the mostly likely to work for this.

One more thing to keep in mind, is that Vale has different "scopes" for rules, and we'll probably need to choose a more specific scope than the defaults to avoid false positives (specifically inside of code blocks): https://vale.sh/docs/topics/scoping/. IIRC this is why the Openverse/TermCasing style targets the text scope.

@madewithkode
Copy link
Collaborator

I would love to take on this. Currently trying my hands on #3847

@sarayourfriend
Copy link
Contributor

Great, thanks @madewithkode, no rush at all on this one 👍

@madewithkode
Copy link
Collaborator

madewithkode commented May 2, 2024

Hi @sarayourfriend @AetherUnbound resolving this issue is likely to render the changes made in #4195 redundant. What would you advise in respect to that?

  1. Revert those changes in this MR
  2. Leave as is
  3. Revert the changes in a separate dedicated issue

Meanwhile thanks @sarayourfriend, the MyST approach is super straightforward and works like a charm.

@AetherUnbound
Copy link
Contributor Author

As long as the current syntax works in both cases (the docs site and in Airflow's markdown renderer), I think leaving as-is is fine!

@madewithkode
Copy link
Collaborator

@AetherUnbound I have confirmed it works for the doc site, how do I check for(where do I find) Airflow's markdown renderer?

@dhruvkb
Copy link
Member

dhruvkb commented May 3, 2024

I looked into the Airflow instance and confirmed that the links are rendered properly. You can verify the same on your local Airflow instance, which can be started by following the catalog quickstart guide.

image

@madewithkode
Copy link
Collaborator

Thanks @dhruvkb. I have been able to verify this as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🤖 aspect: dx Concerns developers' experience with the codebase 🧰 goal: internal improvement Improvement that benefits maintainers, not users help wanted Open to participation from the community 🟩 priority: low Low priority and doesn't need to be rushed 🧱 stack: documentation Related to Sphinx documentation 🐍 tech: python Involves Python
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

4 participants