CoreCodeBench is a benchmark for evaluating LLMs in real-world software development tasks. It contains 1500+ cases, covering development, bug fix, and TDD scenarios with single-function and multi-function problems.
CorePipe is the pipeline for CoreCodeBench. It contains three stages: preprocess, single-function problem generation, and multi-function problem generation. Given a repository, CorePipe can generate benchmark cases for different scenarios.
First, clone the repository and download the dataset.
git clone https://github.com/fulingyue/CoreCodeBench.git
cd CoreCodeBench
Second, download the CoreCodeBench from HuggingFace Single/ Multi, and move jsonl files to ./CoreCodeBench/
folder.
Then, download the source code files from here to the repo directory and extract them as Source_Copy folder.
Now, the file tree should be:
CoreCodeBench/
├── CoreCodeBench/
│ ├── CoreCodeBench_Multi.jsonl/
│ └── CoreCodeBench_Single.jsonl/
├── Source_Copy/
│ ├── cloudnetpy/
│ ├── d3rlpy/
│ └── ...
├── README.md
├── LICENSE
└── ...
We strongly recommend using Docker to get a stable environment and reliable experimental results. Follow the instructions in the Docker setup guide to install Docker on your machine.
- Run
docker pull fulingyue/corecodebench:all
to pull docker from Docker Hub. - Activate docker interactive environment with:
docker run -it -v /path/to/CoreCodeBench/:/workspace fulingyue/corecodebench:all /bin/bash
- To check docker environment, run the script
cd /workspace/environments
bash check_env_conda.sh
This will verify docker images availability, environment configuration, and basic setup.
We also provide conda version environment setup. However, to prevent incompatibility issues across different operating systems, we still strongly recommend using Docker.
- Run
environments/all_env_create_conda.sh
to create conda environments. - Run
environments/check_env_conda.sh
and check conda environments.
Note: If you encounter errors when checking the langchain environment, it may be due to outdated pytest snapshots. In this case, you'll need to update the snapshots by running
pytest --snapshot-update /workspace/path/to/failing/test/file
inside the Docker container.
The evaluation scripts are the same whether you use Docker or Conda environments. Before running the evaluation, please ensure you have successfully executed and passed the environment checks (check_env_docker.sh
or check_env_conda.sh
).
Before running evaluation, you need to implement how to get responses from your model in Evaluation/utils.py
. Specifically:
- Add your model's response implementation in the
get_response()
function. - The function should take the following parameters:
- chat_message: Input prompt text
- model: Name of your model
- gen_kwargs: Generation parameters dictionary (optional)
- Return the model's text completion as a string
The example implementation for OpenAI API is already provided in the get_response() function in utils.py.
For single-function problems, run
bash Evaluation/single_evaluate_conda.sh --model=model_name --types=Development,TDD,BugFix --output_dir=/workspace
Supported problem types: Development, BugFix, TDD. You can run evaluation for a single problem type, for example:
bash Evaluation/single_evaluate_conda.sh --model=model_name --types=Development --output_dir=/workspace --root_dir=/workspace
For multi-function problems, run
bash Evaluation/multi_evaluate_conda.sh --model=model_name --types=Development,TDD,BugFix --output_dir=/workspace --root_dir=/workspace
After running scripts, you can find all responses and test scores in the output_dir/results/model_name directory.
To build a new repository into 6 types of CoreCodeBench problems:
-
Manually place the repository code into the Source_Copy folder
-
Add basic repository information to repo_info.json, including:
- Repository name (Required)
- Import name (Required)
- Github URL (Optional)
- conda env name (Optional)
- Repo Path (Required)
- Running Path (Required): The execution path relative to Repo Path, e.g. "/src/"
- Src path (Required): The source code library path relative to Repo Path, e.g. "/src/transformers/"
- Test Path (Required): The test files path relative to Repo Path, e.g. "/tests/"
-
Set up the corresponding environment according to the repository's documentation and requirements. Subsequently, install
pip install python-call-graph
and copy the environments/pycallgraph directory to replace the pycallgraph directory in your conda environment (usepython -c "import pycallgraph; print(pycallgraph.__file__)"
to find directory).
Note If
dot
is not available (runwhich dot
), runconda install graphviz
to install it.
-
Implement model response function in CorePipe.utils.get_response(); Change paths and configs in CorePipe.config.
-
Run following
conda activate {repo_name_env}
CorePipe/Single-Function/Preprocess.sh repo_name
Run following
CorePipe/Single/single_gen.sh --repo_name={repo_name} --model={model_name} --validate_model={validate_model(for Dev)} --gen_model={gen_model(for BugFix)} --rewrite_model={rewrite_model(for BugFix)}
Single Function Problem will be generated in testcases/{repo_name}/single
.
- Development
conda activate {repo_name_env} ./Generation/Multi-Function/function_generate.sh {repo_name}
- TDD
conda activate {repo_name_env} ./Generation/Multi-Function/function_generate_tdd.sh {repo_name}
- BugFix
conda activate {repo_name_env} ./Generation/Multi-Function/function_generate_debug.sh {repo_name}
- Difficult
conda activate {repo_name_env} ./Generation/Multi-Function/function_generate_difficult.sh {repo_name}
This project is licensed under the MIT License.
If you find our work helpful, please cite our paper as follows:
@misc{fu2025corecodebench,
title = {CoreCodeBench: A Configurable Multi-Scenario Repository-Level Benchmark},
author = {Lingyue Fu, Hao Guan, Bolun Zhang, Haowei Yuan, Yaoming Zhu, Jun Xu, Zongyu Wang, Lin Qiu, Xunliang Cai, Xuezhi Cao, Weiwen Liu, Weinan Zhang, Yong Yu},
year = {2025},
howpublished = {\url{https://github.com/AGI-Eval-Official/CoreCodeBench/blob/main/docs/CoreCodeBench.pdf}},
note = {Accessed: 2024-07-08}
}
For questions or feedback, please open an issue or contact fulingyue@sjtu.edu.cn.