This repository is supplementary to our paper, CompilerDream: Learning a Compiler World Model for General Code Optimization [Arxiv]. We provide the code of our experiments, along with the training data, part of testing data and several model checkpoints.
Set up a conda environment using:
conda env create -f environment.yaml
conda activate compilerdream
Please be aware that you may need to modify the specific version of the PyTorch package listed in the environment.yaml
file to suit your environment. Ensure that the PyTorch version is compatible with your CUDA version.
To download and prepare datasets, follow these steps:
- Download datasets (
compilerdream_data.zip
) from the provided Google Drive link or Zenodo. - Unzip and link the datasets to the
./data
directory:
unzip compilerdream_data.zip
ln -s compilerdream_data data
Here we listed the commands to run the experiments in our paper. As we cannot provide the full training dataset due to size limitation of supplementary file, we only provide a small subset of our training set (CodeContest). Therefore, the results in our paper can not be directly repelicated. You can refer to ./data/README.md
to construct the full CodeContest dataset by yourself.
The random seed can be set by adding --seed 0
to any command.
To train CompilerDream on Cbench:
python examples/train_dreamerv2.py --logdir logdir/compilerdream/codecontest_cbench_nolimit --configs compilergym compilergym_dv3 cbench_train_nolimit --enable_test True --test_interval 5 --eval_eps 100 --save_all_models True
To test the model with random search on Cbench:
python examples/test_guided_search.py --logdir logdir/compilerdream/codecontest_cbench_test --configs compilergym compilergym_dv3 cbench_train_nolimit --task compilergym_cbench --load_logdir [path/to/model/directory] --no_eval True
To test the model without random search:
python examples/train_dreamerv2.py --logdir logdir/compilerdream/codecontest_cbench_200step --configs compilergym compilergym_dv3 test --task compilergym_cbench --enable_test True --test_interval 5 --eval_eps 100 --save_all_models True --test_dataset cbench --test_eps 23 --eval_eps 23 --compilergym.act_space 'NoLimit' --load_logdir [path/to/model/directory]
To train CompilerDream's world model to be an accurate value predictor:
python examples/train_dreamerv2.py --logdir logdir/compilerdream/coreset_train --configs compilergym compilergym_dv3 coreset_train --task compilergym_file_dataset_codecontest --enable_test True --test_interval 5 --save_all_models True --no_eval True
To evaluated the trained model:
python examples/test_on_coreset.py --logdir logdir/compilerdream/coreset_eval --configs compilergym compilergym_dv3 coreset_test --task compilergym_cbench --load_logdir [path/to/model/directory] --no_eval True
A model checkpoint is also available in the VP_checkpoint
folder via Google Drive link.
To train with the CodeContests dataset:
python examples/train_dreamerv2.py --logdir logdir/compilerdream/codecontest_trained --configs compilergym compilergym_dv3 --task compilergym_file_dataset_codecontest --enable_test True --test_interval 5 --eval_eps 100 --save_all_models True
To test CompilerDream on all test benchmarks:
python examples/train_dreamerv2.py --logdir logdir/compilerdream/compilergym_eval --configs compilergym compilergym_dv3 test --task compilergym_cbench --load_logdir [path/to/model/directory]
A model checkpoint is also available in the RL_checkpoint
folder via Google Drive link.
For in-domain training with specific benchmark datasets (e.g., TensorFlow) from CompilerGym:
python examples/train_dreamerv2.py --logdir logdir/compilerdream/tensorflow_trained --configs compilergym compilergym_dv3 --task compilergym_tensorflow --enable_val True
By default, the reward smoothing factor is specified in the configuration file. To disable smoothing or modify its value, use the following argument::
--replay.dreamsmooth 0.0
Here, 0.0 disables smoothing. You can set this to any desired value. By default, this factor is set to 0.6 during RL training and 100.0 for value prediction.
If you have any questions, suggestions, or issues related to CompilerDream, feel free to open an issue or contact us via email: dcy11011@gmail.com or wujialong0229@gmail.com.
@inproceedings{deng2025compilerdream,
title={CompilerDream: Learning a Compiler World Model for General Code Optimization},
author={Chaoyi Deng and Jialong Wu and Ningya Feng and Jianmin Wang and Mingsheng Long},
booktitle={ACM SIGKDD Conference on Knowledge Discovery and Data Mining},
year={2025}
}
This project builds upon and benefits from several prior works:
- We use CompilerDream as the compiler environment and its benchmark suite for testing.
- Our code includes the action space in Poset-RL.
- We include the pass sequence from Coreset-NVP, and our evaluation of the value prediction agent follows its setup.
- The dataset used in this work is constructed based on data from CodeContest, FormAI, and AnghaBench.