Skip to content

LanguageTool's integration for grammar spell checking #141

@jeertmans

Description

@jeertmans

Description

Hello, this is quite in line with #7, but I would like to take it a step further to suggest integrating LanguageTool (LT) using LanguageTool-Rust. I am myself the author of this crate, and I wanted for a long time to integrate LT to LaTeX, or an equivalent tool.

Context

One very annoying thing with LaTeX is its complexity when trying to write external-tools (like linters, grammar checkers, formatters, and so on), mainly because it is hard to extract text content from a given TeX file. With Typst, because it is written in a more modern way, I guess that external tool integration should be easier, especially for tools written in Rust too.

How

LanguageTool has a very nice feature which is checking markup text (from http-api):

image

In LanguageTool-Rust, the implementation looks as follows:

#[derive(Clone, Debug, Deserialize, PartialEq, Eq, Serialize)]
#[non_exhaustive]
#[serde(rename_all = "camelCase")]
/// A portion of text to be checked.
pub struct DataAnnotation {
    /// If set, the markup will be interpreted as this.
    #[serde(skip_serializing_if = "Option::is_none")]
    pub interpret_as: Option<String>,
    #[serde(skip_serializing_if = "Option::is_none")]
    /// Text that should be treated as markup.
    pub markup: Option<String>,
    #[serde(skip_serializing_if = "Option::is_none")]
    /// Text that should be treated as normal text.
    pub text: Option<String>,
}

https://github.com/jeertmans/languagetool-rust/blob/04eea4cdb9c7cde70e553b262cc51dd558fc6aa6/src/lib/check.rs#L124-L138

The idea would then to provide some function, e.g., extract_markup, that would read a Typst file and return an appropriate data structure that escapes all non-text characters using the markup field.

Note about LanguageTool-Rust

Even if the LanguageTool-Rust crate is quite in a good state, tested against a variety of cases, it may not be perfect, and I am open to make modifications if that can help to integrate LT to Typst.

Integration with CLI

If possible, it would be nice to provide CLI commands, like typst check grammar that returns an annotated text with grammar errors found by LT.

Summary

I think that spell checking tools are getting so good that it's very interesting to provide easy integration for them. LT is an open source, widely used, so I think it is a good candidate.

The implementation I proposed above are just ideas, and I am open to criticism or other suggestions :-)

Use Case

Actually, this feature would not be limited to the Web App, but I don't know if this should be integrated directly into the compiler.

The basic use case is that human make mistakes, and why not use tools to help reduce them? LT also provides some rewrite suggestions, which is nice.

Metadata

Metadata

Assignees

No one assigned

    Labels

    feature requestNew feature or request.improveRelated to the diagnostics, comments, and spellcheckintegrationIntegration with other tools.spellcheckAbout spellcheck.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions