Skip to content

Added Pipeline for scheduled link rot checker #3649

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

krishnaduttPanchagnula
Copy link

@krishnaduttPanchagnula krishnaduttPanchagnula commented Jun 21, 2025

Closes #3635

@alexandear alexandear requested a review from Copilot June 21, 2025 17:33
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This pull request adds a scheduled link rot checker that scans markdown files for links and validates them by making HEAD requests.

  • Added a Python script that extracts and processes URLs from markdown files.
  • Introduced a GitHub Actions workflow to schedule the execution of the link rot checker.

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
hack/lin-rot-checker.py New script to extract URLs from markdown files and verify links.
.github/workflows/lin-rot-checker.yml Workflow configuration to run the link rot checker on a schedule.


for link in links:
try:
if requests.head(link).status_code==200:
Copy link
Preview

Copilot AI Jun 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding a timeout to the requests.head call (e.g., requests.head(link, timeout=5)) to prevent the script from hanging on unresponsive links.

Suggested change
if requests.head(link).status_code==200:
if requests.head(link, timeout=5).status_code==200:

Copilot uses AI. Check for mistakes.

@jandubois
Copy link
Member

I think we don't care about the languages used for external tools, as long as they are easy to install both locally and in CI.

But for tools that should become part of our repo, unless there is a really good reason, they should be written in Go or bash1.

Footnotes

  1. I realize that is somewhat ironic, given that I wrote test-port-forwarding.pl in Perl, but that was ages ago, and I would now argue that it should be written in Go instead, if it didn't already exist.

@krishnaduttPanchagnula
Copy link
Author

@jandubois Can you please review and suggest if this cron time works.

repository_dispatch:
workflow_dispatch:
schedule:
- cron: "00 18 * * *"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a comment about when this workflow will be triggered?

According to the cron schedule "Every day at 18:00 UTC"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@@ -0,0 +1,25 @@
name: Link-rot-Checker
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we rename to avoid -?

Something like:

name: Link Checker

name: Broken Link Reporter

name: Automated Link Health Check

name: Scheduled Link Health Check

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

permissions:
issues: write
steps:
- uses: actions/checkout@v4
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use a hash instead of v4, see other workflows

- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

id: lychee
uses: lycheeverse/lychee-action@v2
with:
fail: false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add an explicit output:

Suggested change
fail: false
fail: false
output: ./lychee/out.md

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -0,0 +1,25 @@
name: Link-rot-Checker
on:
repository_dispatch:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems this trigger is not needed

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@alexandear
Copy link
Member

alexandear commented Jun 26, 2025

Please fix a typo in the PR's title: "pipline" -> "pipeline". Also, update the title to match implementation (we don't have script anymore).

@krishnaduttPanchagnula krishnaduttPanchagnula changed the title added script and pipline for scheduled link rot checker added Pipeline for scheduled link rot checker Jun 27, 2025
@krishnaduttPanchagnula krishnaduttPanchagnula changed the title added Pipeline for scheduled link rot checker Added Pipeline for scheduled link rot checker Jun 27, 2025
output: ./lychee/out.md
- name: Create Issue From File
if: steps.lychee.outputs.exit_code != 0
uses: peter-evans/create-issue-from-file@v5
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing hash

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 #v4.2.2
- name: Link Checker
id: lychee
uses: lycheeverse/lychee-action@82202e5e9c2f4ef1a55a3d02563e1cb6041e5332
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing version

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whitespaces seem inconsistent with the other YAML files

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The file name could be just like "links.yml"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please squash the commits

on:
workflow_dispatch:
schedule:
- cron: "00 18 * * *" #Runs the cron at 1800 hrs UTC Everyday
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have a log of a test run?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you test the YAML file?

issues: write
steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 #v4.2.2
- name: Link Checker
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this work with hugo?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the content for Hugo is also managed by markdown, this would work.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that lychee is clever enough to resolve links like {{< ref "/docs/config/multi-arch" >}}?

Shouldn't it be executed against the rendered HTML files, not against the markdown sources?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the assets are also present in the repo which are getting referenced in the html, i have taken this approach. This approach will work for checking the broken links that are present in the website.

For the links present in the repo like, ../governance, /dev/testing which would redirect to another file in the repo, this approach might not work.

Signed-off-by: krishnaduttPanchagnula <krishnadutt123@gmail.com>
Signed-off-by: krishnaduttPanchagnula <krishnadutt123@gmail.com>
Signed-off-by: krishnaduttPanchagnula <krishnadutt123@gmail.com>
Signed-off-by: krishnaduttPanchagnula <krishnadutt123@gmail.com>
Signed-off-by: krishnaduttPanchagnula <krishnadutt123@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add scheduled CI job to check for link rot in repo and website
4 participants