Gemma 3 Fine-Tuning with Unsloth + GRPO

This project demonstrates an efficient and modular pipeline for fine-tuning Google's Gemma 3 (1B) model using Unsloth and GRPO (Generalized Reinforcement Preference Optimization). It includes training, inference, and deployment scripts—ready for integration into production phase or research workflows.

This repo supports training on custom datasets in GSM8K-style format (question + #### answer); simply swap out the dataset loader. It uses parameter-efficient fine-tuning (LoRA) via Unsloth, making it lightweight and suitable for limited compute environments.

❄️ Key Features

Efficient fine-tuning with LoRA adapters via Unsloth
Supports GGUF export for LLaMA.cpp deployment
Reward-based optimization using regex and ratio matching
Clean VS Code-friendly modular structure
Push-to-HuggingFace automation script
Lightweight inference wrapper with system prompts

📁 Project Structure

gemma_finetune/
├── configs/                # YAML config for training
├── data/                   # Dataset preprocessing logic
├── export/                 # Save & push LoRA/GGUF models
├── inference/              # Run inference from finetuned model
├── model/                  # Model and tokenizer loader
├── rewards/                # Reward scoring functions
├── train_module/           # Training loop using GRPO
├── outputs/                # (Ignored) Training checkpoints
├── main.py                 # Entrypoint for training
├── requirements.txt        # Environment dependencies
└── README.md               # This file

🦜 Usage

1. Install dependencies

pip install -r requirements.txt

2. Fine-tune the model

python main.py

3. Inference

python inference/infer.py

4. Push to Hugging Face Hub

python -m export.push_to_hub --repo_id <your_username/repo_name> --token <hf_token> --type lora
# or for GGUF
python -m export.push_to_hub --repo_id <your_username/repo_name> --token <hf_token> --type gguf --quant Q8_0

📆 Model Details

Base: unsloth/gemma-3-1b-it
Sequence Length: 1024 tokens
LoRA Config: r=8, alpha=8, dropout=0
Optimizer: AdamW (torch_fused)
Reward Functions:
- Exact match using regex format
- Approximate match using token patterns
- Numeric and ratio-based answer correctness

⛔️ .gitignore Note

To keep the repo lightweight, large files like checkpoints and compiled caches are excluded:

outputs/
unsloth_compiled_cache/
__pycache__/
*.bin
*.pt

🧠 Credit

Developed using:

👨‍💼 Author

Elias Hossain
Machine Learning Researcher | PhD Student | AI x Reasoning Enthusiast

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Gemma 3 Fine-Tuning with Unsloth + GRPO

❄️ Key Features

📁 Project Structure

🦜 Usage

1. Install dependencies

2. Fine-tune the model

3. Inference

4. Push to Hugging Face Hub

📆 Model Details

⛔️ .gitignore Note

🧠 Credit

👨‍💼 Author

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
configs		configs
data		data
export		export
inference		inference
model		model
rewards		rewards
train_module		train_module
.gitignore		.gitignore
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

eliashossain001/efficient-gemma3-finetuning

Folders and files

Latest commit

History

Repository files navigation

Gemma 3 Fine-Tuning with Unsloth + GRPO

❄️ Key Features

📁 Project Structure

🦜 Usage

1. Install dependencies

2. Fine-tune the model

3. Inference

4. Push to Hugging Face Hub

📆 Model Details

⛔️ .gitignore Note

🧠 Credit

👨‍💼 Author

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages