-
Notifications
You must be signed in to change notification settings - Fork 673
Added Pipeline for scheduled link rot checker #3649
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Added Pipeline for scheduled link rot checker #3649
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This pull request adds a scheduled link rot checker that scans markdown files for links and validates them by making HEAD requests.
- Added a Python script that extracts and processes URLs from markdown files.
- Introduced a GitHub Actions workflow to schedule the execution of the link rot checker.
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
File | Description |
---|---|
hack/lin-rot-checker.py | New script to extract URLs from markdown files and verify links. |
.github/workflows/lin-rot-checker.yml | Workflow configuration to run the link rot checker on a schedule. |
hack/lin-rot-checker.py
Outdated
|
||
for link in links: | ||
try: | ||
if requests.head(link).status_code==200: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider adding a timeout to the requests.head call (e.g., requests.head(link, timeout=5)) to prevent the script from hanging on unresponsive links.
if requests.head(link).status_code==200: | |
if requests.head(link, timeout=5).status_code==200: |
Copilot uses AI. Check for mistakes.
I think we don't care about the languages used for external tools, as long as they are easy to install both locally and in CI. But for tools that should become part of our repo, unless there is a really good reason, they should be written in Go or bash1. Footnotes
|
7f97a80
to
c209142
Compare
@jandubois Can you please review and suggest if this cron time works. |
repository_dispatch: | ||
workflow_dispatch: | ||
schedule: | ||
- cron: "00 18 * * *" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a comment about when this workflow will be triggered?
According to the cron schedule "Every day at 18:00 UTC"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
@@ -0,0 +1,25 @@ | |||
name: Link-rot-Checker |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we rename to avoid -
?
Something like:
name: Link Checker
name: Broken Link Reporter
name: Automated Link Health Check
name: Scheduled Link Health Check
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
permissions: | ||
issues: write | ||
steps: | ||
- uses: actions/checkout@v4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use a hash instead of v4
, see other workflows
lima/.github/workflows/test.yml
Line 29 in 1abadcd
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
id: lychee | ||
uses: lycheeverse/lychee-action@v2 | ||
with: | ||
fail: false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's add an explicit output:
fail: false | |
fail: false | |
output: ./lychee/out.md |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -0,0 +1,25 @@ | |||
name: Link-rot-Checker | |||
on: | |||
repository_dispatch: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems this trigger is not needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
Please fix a typo in the PR's title: "pipline" -> "pipeline". Also, update the title to match implementation (we don't have script anymore). |
2873041
to
39f33a9
Compare
output: ./lychee/out.md | ||
- name: Create Issue From File | ||
if: steps.lychee.outputs.exit_code != 0 | ||
uses: peter-evans/create-issue-from-file@v5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing hash
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 #v4.2.2 | ||
- name: Link Checker | ||
id: lychee | ||
uses: lycheeverse/lychee-action@82202e5e9c2f4ef1a55a3d02563e1cb6041e5332 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing version
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whitespaces seem inconsistent with the other YAML files
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The file name could be just like "links.yml"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please squash the commits
on: | ||
workflow_dispatch: | ||
schedule: | ||
- cron: "00 18 * * *" #Runs the cron at 1800 hrs UTC Everyday |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have a log of a test run?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you test the YAML file?
issues: write | ||
steps: | ||
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 #v4.2.2 | ||
- name: Link Checker |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this work with hugo?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the content for Hugo is also managed by markdown, this would work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that lychee is clever enough to resolve links like {{< ref "/docs/config/multi-arch" >}}
?
Shouldn't it be executed against the rendered HTML files, not against the markdown sources?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the assets are also present in the repo which are getting referenced in the html, i have taken this approach. This approach will work for checking the broken links that are present in the website.
For the links present in the repo like, ../governance, /dev/testing which would redirect to another file in the repo, this approach might not work.
Signed-off-by: krishnaduttPanchagnula <krishnadutt123@gmail.com>
Signed-off-by: krishnaduttPanchagnula <krishnadutt123@gmail.com>
Signed-off-by: krishnaduttPanchagnula <krishnadutt123@gmail.com>
Signed-off-by: krishnaduttPanchagnula <krishnadutt123@gmail.com>
Signed-off-by: krishnaduttPanchagnula <krishnadutt123@gmail.com>
a4490de
to
905f713
Compare
Closes #3635