Compiler-R1 is the first framework that combines Large Language Models (LLMs) and Reinforcement Learning (RL) for compiler pass sequence auto-tuning targeting reducing LLVM IR instruction count. It leverages the reasoning ability of LLMs and the exploration power of RL to efficiently discover high-performance pass sequences.
- ⚙️ Pass Sequence Auto-Tuning: Automatically finds optimized compiler pass sequences for LLVM IR instruction count.
- 🧠 LLM + RL Synergy: Combines pretrained language models and reinforcement learning agents for robust decision-making.
- 🔁 SFT, PPO, and RPP Training Pipelines: Supports multiple fine-tuning and RL methods.
- 🛠️ External Tool Support: Utilizes external tools to aid and enhance the compiler pass auto-tuning process.
- 📊 CoT & Tool-Use Dataset: Incorporates a dataset structured with Chain-of-Thought (CoT) and tool invocation.
# Create and activate conda environment
conda create -n Compiler-R1 python==3.10
conda activate Compiler-R1
# Initialize and update submodules
git submodule update --init --recursive
# Install verl and other dependencies
cd verl
pip3 install -e .
cd ..
pip3 install vllm
pip3 install flash-attn --no-build-isolation
pip3 install FlagEmbedding
pip3 install faiss-cpu
To run Experiment 1 and 2, follow these steps:
This script will generate the training dataset and train the GRPO model.
bash train_Exp_1_2.sh
You can then train models using different strategies:
PPO (Proximal Policy Optimization):
bash train_Exp_1_ppo.sh
RPP (Reward-weighted Preference Policy):
bash train_Exp_1_rpp.sh
SFT (Supervised Fine-Tuning):
bash train_Exp_1_pureSFT.sh
After training your models, follow these steps for inference:
- Merge model weights:
bash infer_model_merge.sh
- Run inference:
bash infer_xxxx.sh
SFT models do not require a merge step. You can run inference directly:
bash infer_xxxx.sh
infer_xxxx.sh
) to point to your trained models and data.
If you use Compiler-R1 in your research or find it useful, please cite our paper:
@misc{pan2025compilerr1agenticcompilerautotuning,
title={Compiler-R1: Towards Agentic Compiler Auto-tuning with Reinforcement Learning},
author={Haolin Pan and Hongyu Lin and Haoran Luo and Yang Liu and Kaichun Yao and Libo Zhang and Mingjie Xing and Yanjun Wu},
year={2025},
eprint={2506.15701},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2506.15701}
}
This repo benefits from Agent-R1. Thanks for their wonderful works.