Name		Name	Last commit message	Last commit date
Latest commit History 88 Commits
images		images
models		models
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
conll18_ud_eval.py		conll18_ud_eval.py
datasets.py		datasets.py
download_data.sh		download_data.sh
download_models.py		download_models.py
download_models.sh		download_models.sh
evaluate.py		evaluate.py
install_finnpos.sh		install_finnpos.sh
install_turku_pipeline.sh		install_turku_pipeline.sh
nlpmodels.py		nlpmodels.py
plot_results.py		plot_results.py
predict.py		predict.py
preprocess_data.py		preprocess_data.py
requirements.txt		requirements.txt
sample_ftb1.py		sample_ftb1.py

Repository files navigation

Evaluating Finnish POS taggers and lemmatizers

This repository contains experiments comparing the accuracy of open source Finnish part-of-speech taggers and lemmatization algorihtms.

Tested algorithms

Test datasets

FinnTreeBank 1: randomly sampled subset of about 1000 sentences
FinnTreeBank 2: news, Sofie and Wikipedia subsets
Turku Dependency Treebank: the testset

Setup

Install dependencies:

Python 3.9
libvoikko with Finnish morphology data files
clang (or other C++ compiler)
Dependencies needed to compile FinnPos and cg3

Setup git submodules, create a Python virtual environment and download test data and models by running the following commands:

git submodule init
git submodule update

python3.9 -m venv venv
source venv/bin/activate
pip install wheel
pip install -r requirements.txt

./download_data.sh
./download_models.sh
python preprocess_data.py

Run

export PATH=$(pwd)/models/cg3/src:$PATH

# Predict lemmas and POS tags using all models.
# Writes results under results/predictions/*/
python predict.py

# Evaluate by comparing the predictions with the gold standard data.
# Writes results to results/evaluation.csv
python evaluate.py

# Plot the evaluations.
# Saves the plots under results/images/
python plot_results.py

The numerical results will be saved in results/evaluation.csv, POS and lemma errors made by each model will be saved in results/errorcases, and plots will be saved in results/images.

Results

Lemmatization

Lemmatization error rates (proportion of tokens where the predicted lemma differs from the ground truth lemma) for the tested algorithms on the test datasets.

Execution duration as a function of the average (over datasets) error rate. Lower values are better on both axes. Notice that the Y-axis is on log scale.

The execution duration is measured as a batched evaluation (a batch contains all sentences from one dataset) on a 4 core CPU. Turku neural parser and StanfordNLP can be run on a GPU which most likely improves their performance, but I haven't tested that.

Part-of-speech tagging

Part-of-speech error rates for the tested algorithms.

Note that FinnPos and Voikko do not make a distinction between auxiliary and main verbs and therefore their performance suffers by 4-5% in this evaluation as they mispredict all AUX tags as VERBs.

Execution duration as a function of the average error rate.

Comparing spacy-fi and StanfordNLP results, it seems that increasing the computational effort about 100-fold seems to improve the accuracy only by a small amount.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Evaluating Finnish POS taggers and lemmatizers

Tested algorithms

Test datasets

Setup

Run

Results

Lemmatization

Part-of-speech tagging

About

Releases

Packages

Languages

License

aajanki/finnish-pos-accuracy

Folders and files

Latest commit

History

Repository files navigation

Evaluating Finnish POS taggers and lemmatizers

Tested algorithms

Test datasets

Setup

Run

Results

Lemmatization

Part-of-speech tagging

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages