LLM Amharic

The best Amharic Large language model! Our goal is to help African businesses by using new technology in AI. By using advanced AI, this project aims to provide smooth, Amharic support across different platforms.

This repository contains scripts and instructions to fine-tune the LLaMA-2-7b-chat model for Amharic customer support using data stored in a PostgreSQL database.

Project Structure

llm-amharic/
├── data/
│ ├── tokenized_dataset/
│ └── load_data_to_db.py
├── docker/
│ ├── Dockerfile
│ └── docker-compose.yml
├── scripts/
│ ├── evaluate_modle.py
│ ├── inference_script.py
│ ├── tokenize_data.py
│ ├── train_model.py
│ └── train_tokenizer.py
├── utils/
│ ├── data_preprocessing.py
│ └── fetch_data_from_db.py
├── .gitignore
├── amharic.model
├── amharic.vocab
├── README.md
└── README.md

Getting Started

Prerequisites

Python 3.8+
PostgreSQL
CUDA-enabled GPU (optional but recommended for training)

Installation

Clone the repository:

git clone https://github.com/10-academy-w5-group-2/llm-amharic.git
cd llm-amharic

Set up a virtual environment:

python3 -m venv venv
source venv/bin/activate  # On Windows, use `venv\Scripts\activate`

Install Requirements:
```
pip install -r requirements.txt
```
Train Tokenizer
```
python scripts/train_tokenizer.py
```
Fine-Tune the Model
```
python scripts/train_model.py
```
Evaluate the model
```
python scripts/evaluate_model.py
```

Database Setup

Ensure your PostgreSQL database is set up with the required data. The table should have a column containing the Amharic text data for training.

Dockerfile

Use docker/Dockerfile to containerize and run the entire project

Contributing

Fork the repository.
Create your feature branch (git checkout -b feature/your-feature).
Commit your changes (git commit -m 'Add your feature').
Push to the branch (git push origin feature/your-feature).
Open a pull request.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLM Amharic

Project Structure

Getting Started

Prerequisites

Installation

Database Setup

Dockerfile

Contributing

License

Contributors

Challenge by

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
data		data
docker		docker
scripts		scripts
utils		utils
.gitignore		.gitignore
README.md		README.md
amharic.model		amharic.model
amharic.vocab		amharic.vocab
requirements.txt		requirements.txt

10-academy-w5-group-2/llm-amharic

Folders and files

Latest commit

History

Repository files navigation

LLM Amharic

Project Structure

Getting Started

Prerequisites

Installation

Database Setup

Dockerfile

Contributing

License

Contributors

Challenge by

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages