Compiler-R1

Compiler-R1 is the first framework that combines Large Language Models (LLMs) and Reinforcement Learning (RL) for compiler pass sequence auto-tuning targeting reducing LLVM IR instruction count. It leverages the reasoning ability of LLMs and the exploration power of RL to efficiently discover high-performance pass sequences.

🌟 Key Features

⚙️ Pass Sequence Auto-Tuning: Automatically finds optimized compiler pass sequences for LLVM IR instruction count.
🧠 LLM + RL Synergy: Combines pretrained language models and reinforcement learning agents for robust decision-making.
🔁 SFT, PPO, and RPP Training Pipelines: Supports multiple fine-tuning and RL methods.
🛠️ External Tool Support: Utilizes external tools to aid and enhance the compiler pass auto-tuning process.
📊 CoT & Tool-Use Dataset: Incorporates a dataset structured with Chain-of-Thought (CoT) and tool invocation.

🔧 Environment Setup

# Create and activate conda environment
conda create -n Compiler-R1 python==3.10
conda activate Compiler-R1

# Initialize and update submodules
git submodule update --init --recursive

# Install verl and other dependencies
cd verl
pip3 install -e .
cd .. 
pip3 install vllm
pip3 install flash-attn --no-build-isolation
pip3 install FlagEmbedding
pip3 install faiss-cpu

🧪 Training

To run Experiment 1 and 2, follow these steps:

1. Generate Dataset and Train GRPO

This script will generate the training dataset and train the GRPO model.

bash train_Exp_1_2.sh

2. Train Models with Different Methods

You can then train models using different strategies:

PPO (Proximal Policy Optimization):

bash train_Exp_1_ppo.sh

RPP (Reward-weighted Preference Policy):

bash train_Exp_1_rpp.sh

SFT (Supervised Fine-Tuning):

bash train_Exp_1_pureSFT.sh

🚀 Inference

After training your models, follow these steps for inference:

For RL-based models (PPO, RPP):

Merge model weights:

bash infer_model_merge.sh

Run inference:

bash infer_xxxx.sh

For SFT models:

SFT models do not require a merge step. You can run inference directly:

bash infer_xxxx.sh

⚠️ Important: Make sure to correctly set paths inside each inference script (infer_xxxx.sh) to point to your trained models and data.

Citation

If you use Compiler-R1 in your research or find it useful, please cite our paper:

@misc{pan2025compilerr1agenticcompilerautotuning,
  title={Compiler-R1: Towards Agentic Compiler Auto-tuning with Reinforcement Learning}, 
  author={Haolin Pan and Hongyu Lin and Haoran Luo and Yang Liu and Kaichun Yao and Libo Zhang and Mingjie Xing and Yanjun Wu},
  year={2025},
  eprint={2506.15701},
  archivePrefix={arXiv},
  primaryClass={cs.LG},
  url={https://arxiv.org/abs/2506.15701}
}

Acknowledgements

This repo benefits from Agent-R1. Thanks for their wonderful works.

Name		Name	Last commit message	Last commit date
Latest commit History 63 Commits
agent_r1		agent_r1
dataset		dataset
examples/data_preprocess		examples/data_preprocess
model_save		model_save
verl @ 3b18b0e		verl @ 3b18b0e
.gitignore		.gitignore
.gitmodules		.gitmodules
Compiler-R1.png		Compiler-R1.png
README.md		README.md
evaluate_sft_batch_autophase.py		evaluate_sft_batch_autophase.py
evaluate_sft_batch_rawcode.py		evaluate_sft_batch_rawcode.py
infer_Exp_3_autophase.sh		infer_Exp_3_autophase.sh
infer_Exp_3_rawcode.sh		infer_Exp_3_rawcode.sh
infer_model_merge.sh		infer_model_merge.sh
infer_run_Exp_1_2.sh		infer_run_Exp_1_2.sh
infer_vllm_serve.sh		infer_vllm_serve.sh
train_Exp_1_2.sh		train_Exp_1_2.sh
train_Exp_1_ppo.sh		train_Exp_1_ppo.sh
train_Exp_1_pureSFT.sh		train_Exp_1_pureSFT.sh
train_Exp_1_rpp.sh		train_Exp_1_rpp.sh
train_Exp_3_autophase.sh		train_Exp_3_autophase.sh
train_Exp_3_rawcode.sh		train_Exp_3_rawcode.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Compiler-R1

🌟 Key Features

🔧 Environment Setup

🧪 Training

1. Generate Dataset and Train GRPO

2. Train Models with Different Methods

🚀 Inference

For RL-based models (PPO, RPP):

For SFT models:

Citation

Acknowledgements

About

Uh oh!

Releases 1

Packages

Languages

Mind4Compiler/Compiler-R1

Folders and files

Latest commit

History

Repository files navigation

Compiler-R1

🌟 Key Features

🔧 Environment Setup

🧪 Training

1. Generate Dataset and Train GRPO

2. Train Models with Different Methods

🚀 Inference

For RL-based models (PPO, RPP):

For SFT models:

Citation

Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages