Benchmark against LanguageTool

There is now a benchmark in `bench/__init__.py`. It computes suggestions from LanguageTool via [language-tool-python](https://pypi.org/project/language-tool-python/) and NLPRule on 10k sentences from Tatoeba and compares the times. 

Heres's the output for German:

```
(base) bminixhofer@pop-os:~/Documents/Projects/nlprule/bench$ python __init__.py --lang=de
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████| 10000/10000 [01:24<00:00, 118.30it/s]
LanguageTool time: 63.019s
NLPRule time: 21.348s

n LanguageTool suggestions: 368
n NLPRule suggestions: 314
n same suggestions: 304
```

and for English: 

```
(base) bminixhofer@pop-os:~/Documents/Projects/nlprule/bench$ python __init__.py --lang=en
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████| 10000/10000 [05:57<00:00, 27.98it/s]
LanguageTool time: 305.641s
NLPRule time: 51.267s

n LanguageTool suggestions: 282
n NLPRule suggestions: 247
n same suggestions: 235
```

I disabled spellchecking in LanguageTool.
LT gives more suggestions because NLPRule does not support all rules.
Not all NLPRule suggestions are the same as LT. Likely because of differences in priority but I'll look a bit closer into that.

Correcting for the Java rules in LT and that NLPRule only supports 85-90% of LT rules by dividing the NLPRule time by `0.8` and normalizing this gives the following table:


|               | NLPRule time | LanguageTool time |
|---------|---------------|--------------------|
| English  | 1                      | 4.77                         | 
| German | 1                      | 2.36                        |

These numbers are of course not 100% accurate but should at least give a ballpark estimate of performance.
I'll keep this issue open for discussion / improving the benchmark.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmark against LanguageTool #6

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Benchmark against LanguageTool #6

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions