RepoBench: Benchmarking Repository-Level Code Auto-Completion Systems
ICLR 2024
-
Feb 5th, 2024: RepoBench v1.1 (with newest code data) is now available on the 🤗 HuggingFace Hub. You can access the datasets for Python and Java using the following links:
- For Python: 🤗 Repobench Python V1.1
- For Java: 🤗 Repobench Java V1.1
For more details of RepoBench v1.1, please refer to the data directory.
-
Jan 16th, 2024: RepoBench is accepted to ICLR 2024! 🎉
git clone https://github.com/Leolty/repobench.git
cd repobench
Note
There is a requirements.txt
file, which contains dependencies for reproducing the results in the paper. If you are only interested in the data, you can skip the installation of dependencies.
As discussed in the paper, we have three settings for each task:
cross_file_first
: Masks the line where a module from a different file is used for the first time.cross_file_random
: Masks a random line where a module from a different file is used (not the first usage).in_file
: Masks a random line that has no cross-file dependency.
To load the desired dataset, use the load_data
utility function from the data.utils
module.
- Download the test data from Google Drive, or simply run the following command:
gdown --id '1HvFFnOybTKEJCrEypWh4ftmW6DZBaiK_' --output ./archive_data/test.zip unzip ./archive_data/test.zip -d ./archive_data/ rm ./archive_data/test.zip
Note
If you download through the browser, please make sure to unzip the file and place the test
folder under archive_data
, i.e., archive_data/test
.
- (Optional) If you also want to download the training data, it can be found here. Similarly, you can run the following command:
gdown --id '179TXJBfMMbP9FDC_hsdpGLQPmN6iB4vY' --output ./archive_data/train.zip unzip ./archive_data/train.zip -d ./archive_data/ rm ./archive_data/train.zip
-
Import the
load_data
function:from archive_data.utils import load_data
-
Call the function with the desired parameters:
data = load_data(split, task, language, settings, length)
Parameters:
split
: Specify whether you want thetrain
ortest
split.task
: Choose betweenretrieval
,completion
andpipeline
.language
: Select the programming language, eitherpython
orjava
.settings
: Choose betweencross_file_first
,cross_file_random
, orin_file
. You can also provide a list combining these settings.length
: (Optional) For thecompletion
task, please specify the length as either2k
or8k
.
Return:
- If a single setting is provided, the function returns the loaded data for that setting.
- If multiple settings are specified, the function returns a list containing data for each setting.
Load completion
data (the 8k
version) for Python
with cross_file_first
, cross_file_random
, and in_file
settings:
data = load_data(split='test', task='completion', language='python', settings=['cross_file_first', 'cross_file_random', 'in_file'], length='8k')
🤯 Data Leakage/Memorization: The test data under archive_data
folder holds code sourced from GitHub, created between February 9, 2023, and August 3, 2023. Please be vigilant: if your model's training data includes code from this timeframe, there's a risk of data leakage and memorization. This can jeopardize the integrity and trustworthiness of your model's evaluation results.
📅 Stay Updated with RepoBench: Our commitment/idea is to regularly update the RepoBench dataset. If you're looking for the most up-to-date code samples, keep an eye out for our releases. And if you feel the need to expedite the process or suggest a collaboration, don't hesitate! Feel free to raise an issue or drop us an email to give us a nudge. Collaborative efforts are always welcomed and appreciated.
Tip
Give us your knowledge cut-off, and we can provide the newest data.
🔔 A Note on Benchmarking: We aim to provide the best, but due to computational constraints, we can't benchmark every data version. If you decide to use our dataset, be ready to benchmark against other competitors.
⚖️ Licensing and Ownership: While we strive for accuracy, licenses for the included repositories may not have been meticulously verified individually. If you identify license discrepancies or if your code is included but you'd prefer it wasn't, please reach out to us immediately.
Follow the instructions in REPRODUCE.md to reproduce the results of the baseline models.
Warning
The code is not well-organized and fully tested. If you encounter any issues, please feel free to raise issues or submit PRs. Thanks!
If you use RepoBench in your research, please consider citing us:
@misc{liu2023repobench,
title={RepoBench: Benchmarking Repository-Level Code Auto-Completion Systems},
author={Tianyang Liu and Canwen Xu and Julian McAuley},
year={2024},
url={https://arxiv.org/abs/2306.03091},
booktitle={International Conference on Learning Representations}
}