Read this Medium article for full discussion.
The semantic models resources are added to Lanfrica
🎉 🎉 🎉 The Amharic RoBERTa model is uploaded in Huggingface Amharic RoBERTa Model 🎉 🎉 🎉
🎉 🎉 The Amharic FLAIR embedding model is integrated into the FLAIR library as am-forward
🎉 🎉 The model will be accessible on the next FLAIR release. Details
🎉 🎉 The Amharic Segmenter, Toknizer, and Translitrator is released and can be installed as pip install amseg
🎉 🎉
🎉 🎉 The Flair based Amharic NER classifier model is now released am-flair-ner 🎉 🎉
🎉 🎉 The Flair based Amharic Sentiment classifier model is now released am-flair-sent 🎉 🎉
🎉 🎉 The Flair based Amharic POS tagger is now released am-flair-pos 🎉 🎉
- Here, we have described the different NLP tasks for which we built models using the benchmark datasets Tasks
- NER
- Sentiment
- POS tagging
- Question classification
- Machine Translation
- The different datsets and resources are available under: Datasets
- Named Entity recognition dataset
- POS dataset
- Sentiment Dataset
- Question Classification Dataset
- Machine Translation Dataset
- For Etiopic word segmentation, tokenization, and translitration check this project: Segmentation
To cite the different Amharic NLP models and resources, use the following paper
@Article{fi13110275,
AUTHOR = {Yimam, Seid Muhie and Ayele, Abinew Ali and Venkatesh, Gopalakrishnan and Gashaw, Ibrahim and Biemann, Chris},
TITLE = {Introducing Various Semantic Models for Amharic: Experimentation and Evaluation with Multiple Tasks and Datasets},
JOURNAL = {Future Internet},
VOLUME = {13},
YEAR = {2021},
NUMBER = {11},
ARTICLE-NUMBER = {275},
URL = {https://www.mdpi.com/1999-5903/13/11/275},
ISSN = {1999-5903},
DOI = {10.3390/fi13110275}
}
To cite the impacts of homophone normalization, use the the following paper
@inproceedings{belay2021impacts,
title={Impacts of Homophone Normalization on Semantic Models for Amharic},
author={Belay, Tadesse Destaw and Ayele, Abinew Ali and Gelaye, Getie and Yimam, Seid Muhie and Biemann, Chris},
booktitle={2021 International Conference on Information and Communication Technology for Development for Africa (ICT4DA)},
pages={101-106},
year={2021},
ISSN = {978-1-6654-3666-3},
DOI = (10.1109/ICT4DA53266.2021.9672229},
publisher={IEEE}
}
To cite the Question Answering Classification for Amharic, use the the following paper
@inproceedings{belay2022question,
title={Question Answering Classification for Amharic Social Media Community Based Questions},
author={Belay, Tadesse Destaw and Yimam, Seid Muhie and Gelaye, Getie and Ayele, Abinew Ali and Biemann, Chris},
booktitle={2022 1st Annual Meeting of the ELRA/ISCA Special Interest Group on Under-Resourced Languages (SIGUL)},
pages={137-145},
year={2022},
publisher={aclanthology}
}
To cite biderectional Amharic to English Machine Translation, use the the following paper
@INPROCEEDINGS{9971385,
author={Belay, Tadesse Destaw and Tonja, Atnafu Lambebo and Kolesnikova, Olga and Yimam, Seid Muhie and Ayele, Abinew Ali and Haile, Silesh Bogale and Sidorov, Grigori and Gelbukh, Alexander},
booktitle={2022 International Conference on Information and Communication Technology for Development for Africa (ICT4DA)},
title={The Effect of Normalization for Bi-directional Amharic-English Neural Machine Translation},
year={2022},
pages={84-89},
doi={10.1109/ICT4DA56482.2022.9971385}
}