Skip to content

Adding vale for spell and style checking #519

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 19 commits into from
May 22, 2023
Merged

Adding vale for spell and style checking #519

merged 19 commits into from
May 22, 2023

Conversation

spier
Copy link
Member

@spier spier commented Jan 13, 2023

Adds a spellchecker for our patterns.
For now, only the English version of our structured patterns located in /patterns/2-structured/.

I opted to ignore the patterns of lower maturity for now, i.e. /patterns/1-initial/, as it would have been a lot more work to get the spellchecker to pass on those. It is not clear when those pattern will be leveled up to be published in our book, therefore it isn't as important that their spelling is 100% correct right now.

For more info about the spellchecking approach see .vale.ini as well as the isc-styles that I created for this purposes.

Key parts of the implementation

  • added spellchecker (and GHA) using vale
  • added documentation about the spellchecking approach to the Contributor Handbook
  • cleaned up the root of our repo (by moving some config files to the /.github folder)

Observations

My goal was to keep the custom dictionary extensions (wordlists) to a minimum.
i.e. for the most part I tried to fix the wording in the patterns. If that wasn't possible, I added custom words to a wordlist, so that the spellchecker will accept those as well.

Many of those custom words are org and author names that are mentioned in our patterns.
The rest are words and acronyms that are used frequently in software development but are apparently not proper words.

References

@spier spier added the ⚙️ Type - Meta Improving how we collaborate in this repo is the main focus of this issue / PR label Jan 13, 2023
@spier spier marked this pull request as ready for review January 13, 2023 17:12
@spier spier marked this pull request as draft January 15, 2023 23:31
@spier
Copy link
Member Author

spier commented Jan 15, 2023

In the last commit I am trying out Vale, which can check both spelling and style apparently.
If the experiment goes well, I might replace pyspelling with Vale.

@spier
Copy link
Member Author

spier commented Jan 18, 2023

@spier
Copy link
Member Author

spier commented Jan 31, 2023

I have not been able to check for American English (en_US) using vale yet.

I was able to do that with Aspell - config in https://github.com/InnerSourceCommons/InnerSourcePatterns/blob/pyspelling/.pyspelling.yml#L5

@rrrutledge rrrutledge self-requested a review February 1, 2023 13:53
@spier
Copy link
Member Author

spier commented Feb 9, 2023

From here:

By default, spelling includes a custom, open-source dictionary for American English.

So that means that vale will check for American English by default.

However it turns out that the dictionary mentioned above contains words that we considered to be strictly British English, like organisation and such.

Therefore we will check if we can find a en_US dictionary that fits our needs.

As vale supports "Hunspell-compatible dictionaries", I am starting my search at https://github.com/hunspell/hunspell

@spier
Copy link
Member Author

spier commented Feb 9, 2023

I got vale to work mostly the way I wanted now.
As dictionary I used the en_US.* files available here: https://cgit.freedesktop.org/libreoffice/dictionaries/plain/en/

Some open questions:

  • should I configure the GHA to download those dicitonary files as part of the workflow? That way we don't need to keep them in this repo. They are not very large, so it might be fast enough.
  • how to configure minimal customizations for the way that we spell things in the ISC?
  • eventually we could move the vale files into a central package, that can be adopted by other ISC project too. we are not at that point yet though.
  • with pyspelling/Aspell, I had to config to accept all org names and author names as correct. do I also need to do that for vale? => it does look like it, when you review this output of the Vale Linting

Todos:

  • get this PR into a merge-ready state i.e. remove any previous experiments that I decided not to use in the end

References:

@rrrutledge
Copy link
Contributor

This is great work! We're very excited to see how this goes and see what can also apply to the Learning Path!

@spier
Copy link
Member Author

spier commented Feb 21, 2023

This is great work! We're very excited to see how this goes and see what can also apply to the Learning Path!

Thanks @rrrutledge. Now I just need to find time to get it to a state where it provides value (and can be merged).
At least it looks pretty likely right now that I will go for vale instead of Aspell, which is great as vale has a nicer integration into GitHub.

@rrrutledge
Copy link
Contributor

That makes sense and sounds like a good approach.

@spier spier changed the title Adding pyspelling (Aspell) spellchecker Adding vale for spell and style checking May 22, 2023
@spier
Copy link
Member Author

spier commented May 22, 2023

Updated the main description of this PR to reflect the status quo (switching implementation from pyspelling to vale).

@spier spier marked this pull request as ready for review May 22, 2023 21:06
@spier
Copy link
Member Author

spier commented May 22, 2023

This is a long standing branch, and the actual spelling fixes introduced here are mostly trivial (mostly switching BE spelling to AE spelling + some other glitches).

Therefore I am opting to merge this now, so that we can get more working experience with this on branches of other contributors (i.e. how is the style/spell checking experience for them).

For further details about the approach with vale also see https://github.com/InnerSourceCommons/isc-styles

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
⚙️ Type - Meta Improving how we collaborate in this repo is the main focus of this issue / PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants