This package uses the spaCy 3.0 extensions to add IWNLP-py as German lemmatizer directly into your spaCy pipeline.
Please report bugs with spacy-iwnlp as issue in IWNLP-py.
import spacy
from spacy_iwnlp import spaCyIWNLP
nlp = spacy.load('de_core_news_sm')
nlp.add_pipe('iwnlp', config={'lemmatizer_path': 'data/IWNLP.Lemmatizer_20181001.json'})
doc = nlp('Wir mögen Fußballspiele mit ausgedehnten Verlängerungen.')
for token in doc:
print('POS: {}\tIWNLP:{}'.format(token.pos_, token._.iwnlp_lemmas))
- Use pip to install spacy-iwnlp
pip install spacy-iwnlp
- Download the latest processed IWNLP dump from https://dbs.cs.uni-duesseldorf.de/datasets/iwnlp/IWNLP.Lemmatizer_20181001.zip and unzip it.
Use develop.py to extend the functionality
Update PIP package
python setup.py sdist bdist_wheel
python -m twine upload dist/PACKAGENAME-VERSION.tar.gz
Please include the following BibTeX if you use IWNLP in your work:
@InProceedings{liebeck-conrad:2015:ACL-IJCNLP,
author = {Liebeck, Matthias and Conrad, Stefan},
title = {{IWNLP: Inverse Wiktionary for Natural Language Processing}},
booktitle = {Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)},
year = {2015},
publisher = {Association for Computational Linguistics},
pages = {414--418},
url = {http://www.aclweb.org/anthology/P15-2068}
}