Natural Language Processing in Ethiopian Languages: Current State, Challenges, and Opportunities - arxiv
This survey delves into the current state of natural language processing (NLP) for four Ethiopian languages: Amharic, Afaan Oromo, Tigrinya, and Wolaytta. Through this paper, we identify key challenges and opportunities for NLP research in Ethiopia. Furthermore, we provide a centralized repository on GitHub that contains publicly available resources for various NLP tasks in these languages. This reposi tory can be updated periodically with contributions from other researchers. Our objective is to disseminate information to NLP researchers interested in Ethiopian languages and encourage future research in this domain.
Tools Name | Tools task | Language support | Resource link |
---|---|---|---|
amseg | Segmenter, tokenizer, transliteration, romanization and normalization | Amharic | amseg |
HornMorpho | Morphological analysis | Amhric, Afaan Ormo, Tigirgna | HornMorpho |
lemma | Lemmatizer | Amhric | lemma |
We discuss the MT progress for Ethiopian languages in three categories: English Centeric -> works done for the above target Ethiopian languages with English pair, Ethiopian - Ethiopian -> works done for Ethiopian language pairs without involving other languages and Multilingual MT -> works done for Ethiopian languages with other languages in a multilingual setting.
- Parallel Corpora Preparation for English-Amharic Machine Translation
- Extended Parallel Corpus for Amharic-English Machine Translation
- Context based machine translation with recurrent neural network for English–Amharic translation
- Offline Corpus Augmentation for English-Amharic Machine Translation
- The Effect of Normalization for Bi-directional Amharic-English Neural Machine Translation
- Optimal Alignment for Bi-directional Afaan Oromo-English Statistical Machine Translation
- English-Afaan Oromo Statistical Machine Translation
- English-Oromo Machine Translation: An Experiment Using a Statistical Approach
- Crowdsourcing Parallel Corpus for English-Oromo Neural Machine Translation using Community Engagement Platform
- Machine Learning Approach to English-Afaan Oromo Text-Text Translation: Using Attention based Neural Machine Translation
- The effect of shallow segmentation on English-Tigrinya statistical machine translation
- Morphological Segmentation for English-to-Tigrinya Statistical Machine Translation
- Enhancing Bi-directional English-Tigrigna Machine Translation Using Hybrid Approach
- Statistical Machine Translator For English To Tigrigna Translation
- An Exploration of Data Augmentation Techniques for Improving English to Tigrinya Translation
- A Parallel Corpora for bi-directional Neural Machine Translation for Low Resourced Ethiopian Languages
- Low-Resource Neural Machine Translation Improvement Using Source-Side Monolingual Data
- English-Ethiopian Languages Statistical Machine Translation
- Amharic-Awngi Machine Translation: An Experiment Using Statistical Approach
- Experimenting Statistical Machine Translation for Ethiopic Semitic Languages: The Case of Amharic-Tigrign
- Context based machine translation with recurrent neural network for English-Amharic translation
- Low resource neural machine translation: A benchmark for five african languages
- WebCrawl African : A Multilingual Parallel Corpora for African Languages
- Part of Speech tagging for Amharic using Conditional Random Fields
- Methods for Amharic Part-of-Speech Tagging
- Amharic Part-of-Speech Tagger for Factored Language Modeling
- Part of speech tagging for Amharic
- POS Tagging for Amharic Text: A Machine Learning Approach
- Part-of-speech tagging for underresourced and morphologically rich languages—the case of Amharic
- Parts of Speech Tagging for Afaan Oromo
- Tigrinya Part-of-Speech Tagging with Morphological Patterns and the New Nagaoka Tigrinya Corpus)
- Part of Speech Tagging for Wolaita Language using Transformation Based Learning (TBL) Approach
- A comparative study on different techniques for thai part-of-speech tagging
- Machine Learning Approaches for Amharic Parts-of-speech Tagging
- Towards improving Brill’s tagger lexical and transformation rule for Afaan Oromo language
- Deep learning-based part-of-speech tagging of the Tigrinya language
- Introducing various Semantic Models for Amharic: Experimentation and Evaluation with multiple Tasks and Datasets
- Question Classification in Amharic Question Answering System: Machine Learning Approach
- Amharic Question Classification System Using Deep Learning Approach
- Amharic Question Answering for Biography, Definition, and Description Questions
- TETEYEQ: Amharic Question Answering For Factoid Questions
- Question Answering Classification for Amharic Social Media Community Based Questions
- MasakhaNER: Named Entity Recognition for African Languages
- Amharic Named Entity Recognition Using A Hybrid Approach
- Named entity recognition for Amharic using deep learning
- Named Entity Recognition for Amharic Using Stack-Based Deep Learning
- ANEC: An Amharic Named Entity Corpus and Transformer Based Recognizer
- Named Entity Recognition for Afan Oromo
- Afaan Oromo Named Entity Recognition Using Hybrid Approach
- Boosting Afaan Oromo Named Entity Recognition with Multiple Methods
- Afan-Oromo Named Entity Recognition Using Bidirectional RNN
- Named-entity recognition for a low-resource language using pre-trained language model
- A method of named entity recognition for Tigrinya
- Named entity recognition for Amharic language
- A named entity recognition for Amharic
-
Vulnerable community identification using hate speech detection on social media
-
Multi-channel convolutional neural network for hate speech detection in social media
-
Amharic text hate speech detection in social media using deep learning approach
-
The 5Js in Ethiopia: Amharic Hate Speech Data Annotation Using Toloka Crowdsourcing Platform
-
Afaan Oromo Hate Speech Detection and Classification on Social Media
-
Detection of hate speech text in afan oromo social media using machine learning approach
-
Hate Speech Detection from Facebook Social Media Posts and Comments in Tigrigna language
- Exploring Amharic Sentiment Analysis from Social Media Texts: Building Annotation Tools and Classification Models
- Comparative Analysis of Deep Learning Models for Aspect Level Amharic News Sentiment Analysis
- Sentiment Analysis of Afaan Oromo using Machine learning Approach
- Sentiment analysis of Afaan Oromoo facebook media using deep learning approach
- Multi-Class Sentiment Analysis from Afaan Oromo Text Based On Supervised Machine Learning Approaches
- Sentiment Analysis for Low-Resource Language: The Case of Tigrinya