PETTINGLLMS

🚀 RL Framework for training Multi Agentic LLMs.🌟

📄 Paper • 🌐 Website • 🎮 Demo • 📖 Documentation • 👥 About Us

PettingLLMs is an open-source framework for on-policy reinforcement learning (RL) with multi-agent large language models (LLMs). It implements AT-GRPO (Agent- and Turn-wise Group Relative Policy Optimization), a novel algorithm and system design for training collaborative LLM agents across **planning, coding, and mathematical reasoning tasks**.

This repo supports:

✅ Single-agent(SA) RL training
✅ Multi-agent RL training (one role-sharing policy)
✅ Multi-agent RL training (role-specialized policies using different lora adaptor or different LLMs)

📰 News

[2025.10] 🚀 GitHub repository open-sourced and publicly available
[2025.10] 🎉 Paper released! Check out our arxiv preprint
[2025.10] 🔥 Support for different LoRA adapters per agent role - enabling efficient role-specialized training
[2025.09] 🌍 Multi-environment support added: Game (Sudoku, Sokoban), Code (APPS, CodeContests), and Math (AIME, OlympiadBench)
[2025.08] 🤖 Multi-agent framework implementation: support for both shared single model and role-specific models

🚀 Key Features

Multi-Level Agent Specialization: Train and specialize agents at any level, from lightweight prompt adjustments to full model fine-tuning with LoRA or reinforcement learning.
Novel RL Algorithm: Implements Agent- and turn wise GRPO- AT-GRPO for efficient and stable multi-agent training.
Built-in Multi-Turn MAS Workflows: Comes with predefined, reproducible benchmarks and environments for a variety of domains:
- 🎮 Games: Sudoku (4x4), Sokoban (6x6)
- 📐 Planning: Plan-Path (10x10 grid)
- 💻 Coding: APPS, CodeContests, LiveCodeBench
- 🔢 Math: AIME24/25, OlympiadBench

🚩 Roadmap

More Environments: Verilog design, web search, robotics, database query, scientific discovery
Multi-Modal Support: Vision-language models, audio processing, mixed-modal tasks
Agentic Framework Integration: AutoGen, LangGraph, CrewAI, and custom framework APIs

📊 Key Results

Table 3 · Ablation on Plan-Path (Qwen3-1.7B)

Method	Acc.(%)	Δ
Single agent	5.00	–
Training tool agent in SA, eval in SA	11.00	+6.00
Training code agent in SA, eval in SA	14.50	+9.50
Training in SA, eval in MAS	16.00	+11.00
MAS RL (role specific policies), eval in MAS	96.00	+91.00
w/ Swapped Policies	6.00	+1.00

🔁 Environment Workflows (MA vs. SA)

📦 Installation

git clone https://github.com/pettingllms-ai/PettingLLMs.git
cd PettingLLMs
bash setup.bash

🎯 Quick Start

1. Dataset Preparation

Prepare datasets for different tasks:

# Code tasks (APPS, CodeContests, LiveCodeBench)
python scripts/dataprocess/load_code.py

# Math tasks (AIME24/25, OlympiadBench)
python scripts/dataprocess/load_math.py

# Game/Planning tasks (Sokoban, Sudoku)
python scripts/dataprocess/load_sokoban.py

Datasets will be saved to datasets/code/, datasets/math/, and datasets/sudoku_environments/.

2. Training

Example: Train multi-agent system on math tasks

bash scripts/train/math/math_L1_prompt.sh

Other training scripts available in scripts/train/:

code_single_policy.sh, code_two_policy.sh - Code domain
plan_path_single.sh, plan_path_two_policy.sh - Planning domain
sokoban_two_policy.sh, sokodu_single.sh - Game domain

3. Evaluation

Example: Evaluate trained model

Edit scripts/evaluate/evaluate.sh to set your model path and config:

MODEL_PATHS=("/path/to/your/model")
CONFIG_NAME="math_single_policy"

Then run:

bash scripts/evaluate/evaluate.sh

Of course, here is a more concise version focusing on how agent roles are differentiated at each level.

🧱 Three Levels of Agent Specialization

PettingLLMs uses a tiered approach to define agent roles, ranging from simple instructions to deep model specialization.

Level	Role Specialization Method	Description
L0	Shared model	Roles are defined solely through instructions in the prompt. The base model is identical for all agents, offering a flexible but performance-limited baseline.
L1	Role-specific LoRA	Each role is specialized using a unique, lightweight LoRA adapter. This creates distinct, cost-effective agent "personalities" on top of a shared base model.
L2	Role-specific Model	The entire model's weights are optimized for a specific role using reinforcement learning. This creates a highly specialized expert agent for maximum performance on complex tasks.

📚 Citation

If you find PettingLLMs useful for your research or projects, please cite:

@article{zhao2025stronger,
  title={Stronger Together: On-Policy Reinforcement Learning for Collaborative LLMs},
  author={Zhao, Yujie and Hu, Lanxiang and Wang, Yang and Hou, Minmin and Zhang, Hao and Ding, Ke and Zhao, Jishen},
  journal={arXiv preprint arXiv:2510.11062},
  year={2025}
}

🔗 Acknowledgements

This work was primarily conducted by Yujie Zhao during her summer internship at Intel Corporation. We gratefully acknowledge Intel's support and resources that made this research possible.

VERL: VERL: Efficient RL Training for LLMs - For efficient distributed RL training infrastructure
RLLM: RLLM: Reinforcement Learning with Language Models - For foundational RL algorithms for LLMs

📌 License

Released under the MIT license. See LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
.github/workflows		.github/workflows
docs		docs
figs		figs
pettingllms		pettingllms
scripts		scripts
verl @ df624dc		verl @ df624dc
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
.readthedocs.yaml		.readthedocs.yaml
LICENSE		LICENSE
README.md		README.md
mkdocs.yml		mkdocs.yml
requirements_venv.txt		requirements_venv.txt
setup.bash		setup.bash
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

PETTINGLLMS

📰 News

🚀 Key Features

🚩 Roadmap

📊 Key Results

Table 3 · Ablation on Plan-Path (Qwen3-1.7B)

🔁 Environment Workflows (MA vs. SA)

📦 Installation

🎯 Quick Start

1. Dataset Preparation

2. Training

3. Evaluation

🧱 Three Levels of Agent Specialization

📚 Citation

🔗 Acknowledgements

📌 License

About

Uh oh!

Releases

Packages

Languages

License

pettingllms-ai/PettingLLMs

Folders and files

Latest commit

History

Repository files navigation

PETTINGLLMS

📰 News

🚀 Key Features

🚩 Roadmap

📊 Key Results

Table 3 · Ablation on Plan-Path (Qwen3-1.7B)

🔁 Environment Workflows (MA vs. SA)

📦 Installation

🎯 Quick Start

1. Dataset Preparation

2. Training

3. Evaluation

🧱 Three Levels of Agent Specialization

📚 Citation

🔗 Acknowledgements

📌 License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages