Skip to content

LLM Finetuning: Enabling Quality Embedding and Text Generation for Amharic, Swahili, and Yoruba Languages

Notifications You must be signed in to change notification settings

10-academy-w5-group-2/llm-amharic

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LLM Amharic

The best Amharic Large language model! Our goal is to help African businesses by using new technology in AI. By using advanced AI, this project aims to provide smooth, Amharic support across different platforms.

This repository contains scripts and instructions to fine-tune the LLaMA-2-7b-chat model for Amharic customer support using data stored in a PostgreSQL database.

Project Structure

llm-amharic/
├── data/
│ ├── tokenized_dataset/
│ └── load_data_to_db.py
├── docker/
│ ├── Dockerfile
│ └── docker-compose.yml
├── scripts/
│ ├── evaluate_modle.py
│ ├── inference_script.py
│ ├── tokenize_data.py
│ ├── train_model.py
│ └── train_tokenizer.py
├── utils/
│ ├── data_preprocessing.py
│ └── fetch_data_from_db.py
├── .gitignore
├── amharic.model
├── amharic.vocab
├── README.md
└── README.md

Getting Started

Prerequisites

  • Python 3.8+
  • PostgreSQL
  • CUDA-enabled GPU (optional but recommended for training)

Installation

  1. Clone the repository:

    git clone https://github.com/10-academy-w5-group-2/llm-amharic.git
    cd llm-amharic
  2. Set up a virtual environment:

    python3 -m venv venv
    source venv/bin/activate  # On Windows, use `venv\Scripts\activate`
  3. Install Requirements:

    pip install -r requirements.txt
  4. Train Tokenizer

    python scripts/train_tokenizer.py
  5. Fine-Tune the Model

    python scripts/train_model.py
  6. Evaluate the model

    python scripts/evaluate_model.py

Database Setup

Ensure your PostgreSQL database is set up with the required data. The table should have a column containing the Amharic text data for training.

Dockerfile

Use docker/Dockerfile to containerize and run the entire project

Contributing

  1. Fork the repository.
  2. Create your feature branch (git checkout -b feature/your-feature).
  3. Commit your changes (git commit -m 'Add your feature').
  4. Push to the branch (git push origin feature/your-feature).
  5. Open a pull request.

License

This project is licensed under the MIT License.

Contributors

  • @abyt101 - Abraham Teka
  • Melaku Alehegn
  • Grace Nyutu
  • Henock Kinfegebriel

Challenge by

10 Academy

About

LLM Finetuning: Enabling Quality Embedding and Text Generation for Amharic, Swahili, and Yoruba Languages

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published