This is the official repo of AutoReproduce and ReproduceBench.
We are currently organizing the code and adding more content form automation. The current code is just a demo.
export OPENAI_API_KEY="<OPENAI_API_KEY>"
export BASE_URL="<BASE_URL>" #If necessary
python reproduce.py #Default setting
For reproduce the paper you want, the paper content should be downloaded (We are currently organizing the use of Mineru for automation).
If the data cannot be obtained directly, please download the data in advance and modify the instruction to specify the path.
python reproduce.py --paper-path xxx --dataloader-path xxx
Currently, for the default setting the paper lineage is not employed. Downloading the code from GitHub is limited. We recommand utilizing your customized github token and run the following commands before reproduction.
export GITHUB_TOKEN="<GITHUB_TOKEN>"
All the datasets and human-curated reference code could be available at ReproduceBench.
pip install -U huggingface_hub
cd AutoReproduce
huggingface-cli download --repo-type dataset --resume-download ai9stars/ReproduceBench --local-dir ReproduceBench
All the evaluation code are under evaluation
. The current code is not well-structured. We are currently working on organizing it.
# First summarize the key points of the paper.
python evaluation/summarize_points.py
# Then run the following files to calculate align-score.
python evaluation/eval_high.py # High-level score
python evaluation/eval_low.py # Low-level score
python evaluation/eval_mixed.py # Mixed-level score
For any questions, you can contact 2429527z@gmail.com.
If you find this work useful, consider giving this repository a star ⭐️ and citing 📝 our paper as follows:
@misc{zhao2025autoreproduceautomaticaiexperiment,
title={AutoReproduce: Automatic AI Experiment Reproduction with Paper Lineage},
author={Xuanle Zhao and Zilin Sang and Yuxuan Li and Qi Shi and Shuo Wang and Duzhen Zhang and Xu Han and Zhiyuan Liu and Maosong Sun},
year={2025},
eprint={2505.20662},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2505.20662},
}
The code is based on the Agent Laboratory. Thanks for these great works and open sourcing!