Description
Hi,
Would you be open to supporting a public stable python API for codespell. Ideally for me, one where I as a consumer can feed codespell with words I want spellchecked and then is given back a valid
or invalid, here are the corrections available. If you auto-correct, use this choice
(or "Not safe for auto-correcting" if that is the data available).
My use case is that I am working on a Language Server (LSP, https://microsoft.github.io/language-server-protocol/specifications/lsp/3.17/specification) and I want to provide spell checking. So far, I have relied on hunspell because it had Python bindings but its accuracy on technical documents with the dictionary I found leaves a bit to be wanted.
In my use case, being able to identify the exact range of the problem is an absolute necessity as my code need to provide the text range for the editor, such that it can show the user exactly where in the open document the problem is. If word-by-word checking is not supported, then API can be pass lines of text to be spellchecked provided the result identifies exactly in the range where the problem is (that is, I need start + end index).
In this case, I would probably also need the API docs to state a bit about why the line by line text is important, because I might need to extract the underlying text from formatting to create the a synthetic line to be spellchecked. As an example, my current code tries to identify and hide common code/file references like usr/share/foo
. In a word-by-word check, I just skip the the spellcheck call for that word. But if I need to pass a line of text to codespell, I would need to removed the ignored and here it is relevant to know how to do that replacement such that the user does not get a false positive because I attempted to avoid another false-positive.
Alternatively, a parser for the dictionaries plus the underlying dictionaries might also be an option a "light weight API" assuming they are easier to keep stable.
I have noted that codespell
can do spell checking from stdin to stdout. However, that is a bit too heavy handed for me to easily replace my hunspell
integration.
That is my wishlist. :) Would this be something that you would be open to supporting?
Note: By stable API, I assumed nothing was stable since __all__ = ["_script_main", "main", "__version__"]
does not look it contains reusable APIs.