🧠 Low-Resource Language Reasoning Framework

An evaluation framework designed to measure how effectively Large Language Models (LLMs) reason and translate across low-resource languages. This project explores whether structured prompting strategies and linguistic guidance improve model performance on languages with limited training data.

The framework evaluates model outputs using automated scoring techniques and compares reasoning quality through translation and question-answering tasks.

🧠 Technologies

AI / NLP

Python
LLaMA 3.1
Large Language Models (LLMs)
Natural Language Processing (NLP)
Prompt Engineering
Chain-of-Thought Prompting

Evaluation & Metrics

SentenceTransformers
BERTScore
Cosine Similarity
Semantic Similarity Evaluation

Data & Visualization

Pandas
Matplotlib
Regular Expressions (Regex)

Research Concepts

Low-Resource Languages
Translation Evaluation
Interlinear Glossed Text (IGT)
Morphological Analysis

⚙️ Features

🌍 Low-Resource Language Evaluation

Evaluates model performance across languages with limited available training data.

🧠 Glossary-Guided Reasoning

Provides linguistic gloss information to guide translation and reasoning tasks.

📝 Multiple Evaluation Tasks

Measures performance through:

Multiple-choice grammar questions
Open-ended translation tasks

📊 Automated Performance Scoring

Generates evaluation metrics automatically using:

Accuracy
Cosine Similarity
BERTScore

📈 Visualization & Reporting

Creates plots and CSV summaries to compare language performance.

🔄 Repeatable Benchmarking Pipeline

Runs the same evaluation process across multiple datasets and languages.

🔧 The Process

The project was built as a structured evaluation pipeline:

1. Dataset Preparation

Language datasets were prepared using:

Multiple-choice grammar questions
Interlinear glossed text (IGT)
Ground-truth translations

2. Prompt Construction

Two prompting approaches were implemented:

Direct multiple-choice prompting
Glossary-guided Chain-of-Thought prompting

3. Model Inference

Prompts were sent to an LLM to generate predictions and translations.

4. Evaluation

Outputs were evaluated using:

Accuracy for multiple-choice tasks
Cosine similarity for semantic comparison
BERTScore for translation quality

5. Reporting

Results were exported into:

CSV summary files
Visualization charts
Aggregate evaluation reports

📚 What I Learned

How to design evaluation frameworks for LLMs
Applying prompt engineering techniques to structured reasoning tasks
Measuring language understanding beyond simple accuracy
Working with embedding-based similarity metrics
Building automated benchmarking pipelines
Understanding challenges associated with low-resource languages

🚀 How It Can Be Improved

Compare additional LLM architectures
Expand evaluation across more languages
Add human evaluation for translation quality
Introduce retrieval-augmented prompting
Support multilingual embeddings
Add experiment tracking and dashboards
Build an interactive interface for running evaluations

▶️ Running the Project

1. Clone the repository

git clone <repo-url>
cd low-resource-language-reasoning

2. Install dependencies

pip install -r requirements.txt

3. Configure API credentials

Create an environment file:

API_KEY=your_api_key

4. Add datasets

Place language datasets in the project directory:

language_mcq.txt
language_igt.txt

5. Run evaluation

python main.py

6. View outputs

Generated results will appear inside:

outputs/

Including:

CSV summaries
Evaluation metrics
Performance plots

📌 Notes

This project focuses on evaluating reasoning behavior rather than training models. Results may vary across models, prompts, and language families.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
outputs		outputs
README.md		README.md
kalamang_igt.txt		kalamang_igt.txt
kalamang_mcq.txt		kalamang_mcq.txt
mauwake_igt.txt		mauwake_igt.txt
mauwake_mcq.txt		mauwake_mcq.txt
mini_glossary_cot_evaluator.py		mini_glossary_cot_evaluator.py
moloko_igt.txt		moloko_igt.txt
moloko_mcq.txt		moloko_mcq.txt

Folders and files

Latest commit

History

Repository files navigation

🧠 Low-Resource Language Reasoning Framework

🧠 Technologies

AI / NLP

Evaluation & Metrics

Data & Visualization

Research Concepts

⚙️ Features

🌍 Low-Resource Language Evaluation

🧠 Glossary-Guided Reasoning

📝 Multiple Evaluation Tasks

📊 Automated Performance Scoring

📈 Visualization & Reporting

🔄 Repeatable Benchmarking Pipeline

🔧 The Process

1. Dataset Preparation

2. Prompt Construction

3. Model Inference

4. Evaluation

5. Reporting

📚 What I Learned

🚀 How It Can Be Improved

▶️ Running the Project

1. Clone the repository

2. Install dependencies

3. Configure API credentials

4. Add datasets

5. Run evaluation

6. View outputs

📌 Notes

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages