This repository hosts code and models for the named entity recognition (NER) work performed at Reykjavik University in 2019-2020.
The models presented here have been trained on the Icelandic MIM-GOLD-NER named entity corpus, annotated as part of this work.
Implemented here are three different NER models, and an voting system combining the output of the three models. The methods used for training are the following:
- A Conditional Random Fields NER model – implementation based onPassos et al 2014
- Ixa-pipes-ner, a perceptron model with shallow word features and externally trained word clusters – Agerri & Rigau 2017
- NeuroNER, a Bi-LSTM RNN with pre-trained word embeddings (GloVe) – Dernoncourt et al. 2017
- CombiTagger, an ensemble voting system – Henrich et al. 2009
- install https://github.com/ixa-ehu/ixa-pipe-nerc anywhere according to their guide, create a softlink to its directory at ixa-pipe/nerc
- install https://github.com/hrafnl/CombiTagger anywhere according to their guide, create a softlink to its directory as CombiTagger at the root
- get the Icelandic (MIM-GOLD)[http://www.malfong.is/index.php?lang=en&pg=gull] corpus
The script run_combined_system.sh shows the output of the three models and CombiTagger.
This project is licensed under the Apache License 2.0 - see the (LICENSE)[https://github.com/cadia-lvl/NER/blob/master/LICENSE] file for details.
Reykjavik University
- Ásmundur Alma Guðjónsson asmundur10@ru.is
- Svanhvít Lilja Ingólfsdóttir svanhviti16@ru.is
- Hrafn Loftsson, Associate Professor hrafn@ru.is
This project was funded by the with funding from the Icelandic Strategic Research and Development Programme for Language Technology 2019, grant no. 180027-5301.