Skip to content

Add Vale for content linting to the repo #541

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 34 commits into from
Jun 6, 2025
Merged

Add Vale for content linting to the repo #541

merged 34 commits into from
Jun 6, 2025

Conversation

paddyroddy
Copy link
Member

@paddyroddy paddyroddy commented Jun 2, 2025

In this PR I add vale similarly to paddyroddy/talks#80 paddyroddy/talks#82. See this blog post for an explanation of what it is https://www.datadoghq.com/blog/engineering/how-we-use-vale-to-improve-our-documentation-editing-process. I used this extensively during my PhD https://github.com/paddyroddy/phd-thesis/blob/main/.vale.ini. I'm going to talk about this in the Collaboration Hour, so this PR is in prep for that.

The talk: https://paddyroddy.github.io/talks/linting-prose-with-vale.

@paddyroddy paddyroddy self-assigned this Jun 2, 2025
@paddyroddy paddyroddy added the enhancement New feature or request label Jun 2, 2025
@paddyroddy paddyroddy requested a review from Copilot June 2, 2025 23:21
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This pull request adds Vale content linting to the repository by introducing Vale configuration, custom style rules, and a new CI job to run Vale. Key changes include:

  • Adding a new Vale configuration file (.vale.ini) and custom styles under .github/styles.
  • Updating license files and issue templates to improve consistency and adhere to new styling choices.
  • Adding a new GitHub Actions workflow job to run Vale and enforce content linting.

Reviewed Changes

Copilot reviewed 24 out of 24 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
tests/data/test_package_generation/LICENSE.md Updated copyright text formatting.
tests/data/test_package_generation/.github/ISSUE_TEMPLATE/feature_request.yml Updated placeholder punctuation using a Unicode ellipsis.
tests/data/test_package_generation/.github/ISSUE_TEMPLATE/bug_report.yml Changed ellipsis formatting in code snippets for consistency.
docs/pages/virtual.md Modified venv description to remove redundant wording.
docs/pages/libraries/parallel-async.md Adjusted punctuation in the dask description.
docs/pages/docs.md Simplified language in documentation description.
docs/pages/ci.md Revised the Coveralls service description for clarity.
docs/LICENSE Updated license formatting.
README.md Inserted Vale annotation comments to bypass unwanted style checks.
LICENSE.md Updated license formatting.
.vale.ini New Vale configuration file added.
.github/workflows/linting.yml Added a new CI job to run Vale.
.github/styles/custom/spaces.yml New custom style rule enforcing single spacing.
.github/styles/custom/quotes.yml New custom style rule enforcing straight quotes.
.github/styles/custom/oxford-comma.yml New custom style rule encouraging the use of the Oxford comma.
.github/styles/custom/no-contractions.yml New custom style rule to discourage contractions.
.github/styles/custom/endash.yml New custom style rule to avoid en dashes and suggest em dashes in parentheses.
.github/styles/custom/dashes.yml New custom style rule to remove spaces around dashes.
.github/styles/custom/british-spellings.yml New rule set for enforcing British spellings.
.github/ISSUE_TEMPLATE/feature_request.yml (in .github folder) Updated placeholder punctuation using a Unicode ellipsis.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@paddyroddy paddyroddy marked this pull request as ready for review June 2, 2025 23:22
@paddyroddy paddyroddy requested a review from samcunliffe June 2, 2025 23:22
@paddyroddy paddyroddy requested a review from Copilot June 2, 2025 23:27
Copilot

This comment was marked as outdated.

Copy link
Member

@samcunliffe samcunliffe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I really like the language simplifications. I think we want this 🫐 .

This was just a quick pass before seeing your talk. Curious about uncluttering the custom config. Could we make our own ARC vale language style erm... theme? (Not sure what's the correct word)

LICENSE.md Outdated
@@ -1,6 +1,6 @@
# MIT License

Copyright (c) 2023-2025 UCL Centre for Advanced Research Computing
Copyright © 2023-2025 UCL Centre for Advanced Research Computing
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh. It's from vale. Probs not this one, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the practice involving licences? Once made can never change? Why do we use the (c) here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well the GPL specifically says "thou shalt not change this file".

MIT doesn't have any such clause. But to avoid potential problems, I would not reword license files.

(c)© is probably innocent enough. I would leave as is, but ¯\_(ツ)_/¯

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wait I realise you mean why the (c) at all: because it's not in the MIT template. That's a good point. Did it come from GitHub?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know where it came from, it is a bit weird though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know enough about licences, but it does seem weird that I can't reword my own licence. Guess it makes sense for GPL. Like what if you made a typo?

Copy link
Member

@samcunliffe samcunliffe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry. Realised we need to ignore the license files in both the repo and the template.

Copy link
Collaborator

@matt-graham matt-graham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of having a prose linter but I would say we probably want some discussion of what rules to include (possibly this should be needs-2-reviewers?). I personally don't think the ellipsis replacement should be included as I would say the marginal gain in readability of the Unicode character in cases where it renders nicely (when not using a fixed-width font) are outweighed by the cases where it decreases readability (when showing in a fixed-width font for example) or might change the meaning of Python code snippets embedded in non-Python files. I also think the -ize/-ization/-ized/-izable to -ise/-isation/-ised/-isable replacements as currently implemented might end up with spurious matches, and I'd personally vote for not standardizing (😛) on -ise endings to begin with (but agree with at least having a consistent convention). I also find the removal of all very instances a bit odd.

@@ -21,7 +21,7 @@ body:
value: |-
import cookiecutter_test

...
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say we want to disable the ellipsis replacement because of places like this - this is Python code in a YAML file and I suspect we'll have other cases like this where we have for example Python code in a Markdown file that we don't want it touch. While it doesn't really matter here as this is meant more to indicate user to fill in rest of reproducing code, while ... is valid Python syntax raises a syntax error ('Invalid character').

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a fair enough reason. @K-Meech raised the point of screen readers for prose though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file shouldn't have changed as not a Markdown file. This is me copy-and-pasting

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@paddyroddy - I'm not sure what you mean by 'screen readers for prose' here? As far as I'm aware screen readers will handle both ellipsis styles similarly

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I thought that was something said the other day. I have no idea what they do with ellipsis.

@samcunliffe
Copy link
Member

I would say we probably want some discussion of what rules to include (possibly this should be needs-2-reviewers?)

🗳️ This guy wants: readability and accessibility suggestions inline in my PRs that have a lot of prose. I'm actually mostly interested in the accessibility of what I write.

... Should we run the Readability rules and see what it suggests?

I'm not super bothered by consistency of contractions or em-dashes (though I understand the "-ize" sentiment, I would support consistent British spelling and fewer hedging and weasel words).

@samcunliffe samcunliffe added the needs-2-reviewers Could be considered "controversial" so worth a second pair of eyes label Jun 3, 2025
@paddyroddy
Copy link
Member Author

paddyroddy commented Jun 3, 2025

I like the idea of having a prose linter but I would say we probably want some discussion of what rules to include

This was very much for demo purposes. Obviously, we'd need to decide on what rules we want.

@paddyroddy
Copy link
Member Author

paddyroddy commented Jun 5, 2025

@samcunliffe @matt-graham I have reverted the changes, excluded licence files, made CI not error. To see the output you need to click on
image
we can then iterate on rules to turn on/off

Can be seen more clearly here
image

@paddyroddy paddyroddy requested a review from Copilot June 6, 2025 08:06
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR integrates Vale for linting markdown content by adding a configuration file and CI workflow.

  • Introduces .vale.ini to define Vale styles, packages, and rule overrides.
  • Adds a new lint-prose GitHub Actions job to run Vale checks on the repository.

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
.vale.ini Defines Vale settings, style paths, enabled packages, and rule overrides for .md files.
.github/workflows/linting.yml Adds a lint-prose job using the errata-ai/vale-action to run Vale in CI.

.vale.ini Outdated
write-good.TooWordy = NO

# Ignore lines starting with "> [!NOTE]"
BlockIgnores = (?s)> \[!NOTE\].*?(\n|$)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This replaces the inline comment before @matt-graham

Copy link
Member

@samcunliffe samcunliffe Jun 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eeeeeh what about TIP IMPORTANT WARNING CAUTION? Do they get flagged.
I could imagine > [!TIP] is quite useful.

Suggested change
BlockIgnores = (?s)> \[!NOTE\].*?(\n|$)
BlockIgnores = (?s)> \[!(CAUTION|IMPORTANT|NOTE|TIP|WARNING)\].*?(\n|$)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah they would be

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah so it was the > ![NOTE] alert syntax specifically that was triggering not the content of the alert? I guess we could also include the other alert types here (!TIP, !IMPORTANT, !WARNING, !CAUTION). I don't entirely understand the regex here - it looks like Value uses regexp2 which is based on Go regexp, but after reading the docs I'm still entirely sure what (?s) bit is doing here (appears to be setting a flag to let . match \n but not sure why this is needed and whether it applies just to the group?). I think something like

^ *> \[!(NOTE|IMPORTANT|WARNING|CAUTION|TIP)\] *$

would avoid unintentionally capturing other characters with . while still allowing for white space before or after alert syntax (could also use \w in place of but not sure we actually want to allow for other white space types).

Copy link
Collaborator

@matt-graham matt-graham Jun 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just spotted that you've already covered all altert types in last commit 73fd5c1 - I still don't entirely understand the regex but it works so marking as resolved! (EDIT: I hadn't refreshed before commenting so hadn't seem Sam's comment either).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I assume you meant \s in place of ? Hadn't occurred to me that \s was any type of whitespace. This is just in this flavour of regex? I've always taken it to mean strictly .

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah bad news, yours doesn't work

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So in the example linked to in the docs there it looks like it's excluding the whole block {< file } delimited block. Are we trying to similarly ignore the context of callouts or just the syntax which starts an alert? From https://regex101.com/r/rvRsMy/1 it looks like we just match the initial bit not the content. If so I think the regex I suggested will do so as well but is tighter / less chance for false matches

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Went with (?s)> \[!(CAUTION|IMPORTANT|NOTE|TIP|WARNING)\](\n|$) and tested on regex101

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah we want to just ignore the top line itself, not the contents within it. Your regex doesn't work, feel free to try it yourself locally (run vale sync; vale .)

Co-authored-by: Sam Cunliffe <samcunliffe@users.noreply.github.com>
paddyroddy and others added 2 commits June 6, 2025 11:48
Co-authored-by: Matt Graham <matthew.m.graham@gmail.com>
Copy link
Collaborator

@matt-graham matt-graham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than potentially excluding code of conduct looks good to me, thanks @paddyroddy!

@paddyroddy paddyroddy merged commit cdc5faa into main Jun 6, 2025
15 checks passed
@paddyroddy paddyroddy deleted the vale branch June 6, 2025 14:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request needs-2-reviewers Could be considered "controversial" so worth a second pair of eyes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants