-
-
Notifications
You must be signed in to change notification settings - Fork 325
modules/spellfiles: init #3143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
modules/spellfiles: init #3143
Conversation
659e056
to
2c168d6
Compare
I really like this. I had been playing around with spellfiles, at some point, and was surprised we didn't have a better baked in solution for it. Thanks |
Wow, this new module looks really nice! Regarding your questions:
I think
It could even be useful to have |
Well, I don't know... At this point, users that want to customize both Also, I am curious about why a user would want an encoding different than |
Yeah that makes sense!
I think that mostly happens when dealing with legacy systems or when opening old files (e.g., latex files from like ten years ago where often saved as latin1 due to encoding issues with utf-8). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great! I've left some feedback below.
My only major concern is the way we're pinning URLs and hashes; I think this would too easily get out of sync and result in "old" configs being un-buildable.
update-scripts/fetch-spellfiles.nix
Outdated
# Extract only .spl and .sug file links | ||
filenames=$(echo "$html" | grep -oP '(?<=href=")[^"]+\.(spl|sug)' | sort -u) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather do this with a syntax-aware tool like pandoc, but this is fine for now if it works reliably.
I think a pandoc impl would look very similar to our current docs/fix-links
util
|
||
hash=$(nix hash convert --to sri --hash-algo sha256 "$sha256") | ||
|
||
# Ugly hardcoding of the `%40` -> `@` substitution |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there not a URI decode utility we could use to do this properly? Even if we have to add another runtime input?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's some different approaches here: https://stackoverflow.com/questions/6250698/how-to-decode-url-encoded-string-in-shell
One answer suggests using python's urllib
:
python3 -c "import sys; from urllib.parse import unquote; print(unquote(sys.stdin.read()));"
Although, if we're depending on python it's probably tempting to write the whole script in python instead of bash?
Some bash-based approaches are replacing all instances of '%'
with '\x'
. IDK if that correctly handles %%
though.
This one with printf
is nice too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried many bashishms, and none was working. Also, it was very hard to do escape as this bash script is defined in a .nix
file...
Again, I would question the relevance of over-engineering a script which input's changes once every decade.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one with
printf
is nice too.
Well, I would very much avoid adding this much complexity. I prefer the script completely failing in 2 years rather than this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I was just listing various options. I'd suggest using python's urllib as the simplest and most robust approach, without too much over-engineering.
modules/spellfiles.nix
Outdated
# Remove region from language id ("en_us" -> "en") | ||
(map (lang: lib.head (builtins.match "^([^_]+)" lang))) | ||
]; | ||
languages = if cfg.languages == "auto" then languagesFromSpelllang else cfg.languages; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fancy idea: could we move this to cfg.languages
apply function?
That way users can read the value to figure out what languages their "auto" definition actually ended up producing.
Format normalisation could also be done there.
Also makes me wonder if the option should be a list type? 🤔 And maybe auto should be a separate bool option?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fancy idea: could we move this to
cfg.languages
apply function?That way users can read the value to figure out what languages their "auto" definition actually ended up producing.
Good idea! Done.
Format normalisation could also be done there.
What do you mean? I think this is what I have done.
Also makes me wonder if the option should be a list type? 🤔 And maybe auto should be a separate bool option?
I don't really see a benefit there...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really see a benefit there...
The benefit of a list-type is definition merging, and just generally having a more structural config.
If the option is a list-type, then I can define [ "en" ]
in one module, define [ "fr" ]
in another module, and it will be merged as [ "en" "fr" ]
. With a str-type option, merging is not supported.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or maybe you meant the separate auto-option? I think the benefit of that separation is simplifying the types and the impl, making things more obvious to end-users.
If you had a separate auto-toggle, you wouldn't need any if value == "auto"
; in fact, you wouldn't need an apply
at all: you could just have an extra definition that sets config.spellfiles.languages = lib.pipe ...
Maps "en.utf-8.spl" to: | ||
{ | ||
name = "spell/en.utf-8.spl"; | ||
|
||
value.source = pkgs.fetchurl { | ||
url = "https://ftp.nluug.nl/pub/vim/runtime/spell/en.utf-8.spl"; | ||
hash = "sha256-/sq9yUm2o50ywImfolReqyXmPy7QozxK0VEUJjhNMHA="; | ||
}; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little concerned about our pinning technique.
Firstly, I assume that spell files are occasionally modified, and don't offer a versioned/pinned mirror?
This leads to my issue with the assumption that we can generate a table of file hashes without running into issues.
E.g. a user rolls back to an old nixvim revision: now the hash is outdated
Or: nixvim is slow to update: same outcome...
Or: someone is bisecting their config
Or: we are bisecting nixvim (which has a test for spell files)
I think we either need to download the actual spell files in our generate script, or find a way to have versioned URLs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #3143 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I thought about downloading the spellfiles, but they are binary files and some of them weigh several dozens GB. EDIT: MB* of course ^^
4f994e0
to
4d03d27
Compare
Thanks for the review @MattSturgeon! Of course, I have thought about this issue of "outdated hashed" (see the PR description). --> Indeed, the spellfiles haven't been modified in the past 6+ years. This doesn't mean that they won't get modified again at some point, but I would consider them to be very stable. |
Oops, I missed that.
That is reassuring, if nothing else 😀
For me the issue is that when they inevitably do change, all configs that had been using the spellfile module will suddenly break without having been changed. This means the module is technically non-deterministic. It's the same reason why you would specify a package src using a commit or a tag, rather than a branch ref, in nixpkgs. We want old revisions of nixpkgs to continue building indefinitely.
I think the key difference is that all those other things are handled by our All that said: I don't see a viable alternative, short of finding or hosting our own mirror. Which I don't want to do, either. It's great that the files seem to have been stable for 6 years,1 but it isn't great that they seemingly don't offer a stable/versioned URL we can use. I did see their README has instructions for building spell files from source. That may be a better alternative, especially if it can be done in such a way that the nixpkgs and/or nix-community binary cache has a cached build of the spellfile packages. Footnotes
|
It is the official one used by upstream to download spellfiles: https://github.com/vim/vim/blob/2a6be835122d906c6ec10682c2a771b25c87c611/runtime/autoload/spellfile.vim#L9.
I agree... Alternatives
|
Vim spellfiles are apparently built from libreoffice dictionaries, which I assume are already packaged. If we can reproduce the from-source instructions as a derivation, it should be possible to package the vim spell files as derivations with the nixpkgs libreoffice dicts as nativeBuildInputs Build steps
This involves downloading the files from the libreoffice github, applying a To do this all in one go use the Aap program. It's simple to install, it only requires Python. Before generating spell files, verify your system has the required locale You can also do it manually:
LOCALE For proper spell file generation the required locale must be installed. Otherwise Vim doesn't know what are letters and upper-lower case differences. Modern systems use UTF-8, but we also generate spell files for 8-bit locales On Ubuntu the default is to only support locales for your own language. To sudo vim /var/lib/locales/supported.d/local
# Add needed lines from /usr/share/i18n/SUPPORTED
sudo dpkg-reconfigure locales When using the I guess we'd have to vendor the patch files within nixpkgs, unless there's a safe url from which they can be fetched (are they in a git repo?). The thing I'm most unsure of is enabling all required locales within the nix builder sandbox. I guess we'd start by running |
Motivation
Nix(vim) is all about offering the best out-of-the-box experience.
Hence, why not providing the spellfiles on-demand?
This PR implements the new
spellfiles
top-level option which allows users to decide which spellfiles should be automatically fetched from the vim ftp server.Fixes #1919
Under the hood, this option populates
extraFiles."spell/en.utf-8.spl"
(for example) with the files fetched fromhttps://ftp.nluug.nl/pub/vim/runtime/spell/
withpkgs.fetchurl
.Nixvim's vendored spellfile hashes
I added an extra script to the
update-scripts
folder which pre-fetches the hashes of all spell files hosted on the official FTP server.Hence, while the spellfiles are downloaded at build-time when users deploy their configurations, they won't need to manually provide the expected hash. This information will already be available in the nixvim repo (in
generated/spellfiles.nix
).Note
One could think that vendoring the hashes in Nixvim and hoping they will still be valid when a user will build his config is a bad idea. Indeed, it it changes, one could have a hash-mismatch error without any fallback to overwrite it.
My answer to this is that those files are apparently very stable (last updated in 2019), so it is not a huge risk.
Module options (
cfg = config.spellfiles
)This behavior is opt-in (
cfg.enable
defaults tofalse
).By default, the module automatically infers the list of spellfiles
LANG.ENCODING.[spl,sug]
to install:LANG
: Obtained from the following sources, in decreading priority:cfg.languages
opts.spelllang
["en"]
(default)ENCODING
: Obtained from the following sources, in decreading priority:opts.encoding
"utf-8"
(default).sug
files: Actual spellfiles are.spl
files and are always fetched. However, someLANG.ENCODING
combos have a "suggestion" file.sug
available. They are fetched ifcfg.includeSuggestions
istrue
(defaults tofalse
).Besides, he can also explicitly specify additionnal spellfiles to fetch:
Note: At first, I made
cfg.spellfiles
andcfg.languages
exclusive, but actually, it is better to allow both of them to be set concurrently.Design questions
cfg.languages
orcfg.spelllangs
?cfg.encoding
option?Resources
Credits
The idea is from a very smart suggestion from @mirkolenz.