CodeRAG

Source code for our EMNLP 2025 paper: "CodeRAG: Finding Relevant and Necessary Knowledge for Retrieval-Augmented Repository-Level Code Completion" [arXiv].

📦 Environment Setup

1. Install uv

2. Synchronize dependencies

uv sync

3. Activate the virtual environment

source .venv/bin/activate

🚀 Usage

Before running scripts, download benchmarks (recceval and cceval) and edit the configuration file:

config/config.toml

Then execute the Python scripts sequentially:

1. Build Query

python scripts/build_query.py

Generates query strings from the benchmark dataset.

2. Retrieve Relevant Code Blocks

python scripts/retrieve.py

Retrieves top-k relevant code blocks using the configured retriever.

3. Rerank Retrieved Code Blocks

python scripts/rerank.py

Reranks retrieved code blocks based on their estimated importance.

4. Build Prompts for Generator

python scripts/build_prompt.py

Constructs prompts from retrieved code blocks for the code completion generator.

5. Run Inference

python scripts/inference.py

Feeds prompts to the generator model.
You can replace this step with your own inference code.
Input: JSON file containing an array of strings
Output: JSON file containing an array of generated completions.

6. Evaluate Results

python scripts/evaluation.py

Evaluates code completion performance using inference results.

📄 Citation

If you find this work helpful, please consider citing our paper:

@inproceedings{coderag2025,
  title={CodeRAG: Finding Relevant and Necessary Knowledge for Retrieval-Augmented Repository-Level Code Completion},
  author={Sheng Zhang, Yifan Ding, Shuquan Lian, Shun Song, Hui Li},
  booktitle={Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing (EMNLP)},
  year={2025}
}

📧 Contact

For questions, please open an issue or contact dingyf@stu.xmu.edu.cn.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
HumanEvaluation		HumanEvaluation
benchmark		benchmark
coderag		coderag
config		config
scripts		scripts
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
pyrightconfig.json		pyrightconfig.json
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CodeRAG

📦 Environment Setup

1. Install uv

2. Synchronize dependencies

3. Activate the virtual environment

🚀 Usage

1. Build Query

2. Retrieve Relevant Code Blocks

3. Rerank Retrieved Code Blocks

4. Build Prompts for Generator

5. Run Inference

6. Evaluate Results

📄 Citation

📧 Contact

About

Uh oh!

Releases

Packages

Languages

License

KDEGroup/CodeRAG

Folders and files

Latest commit

History

Repository files navigation

CodeRAG

📦 Environment Setup

1. Install uv

2. Synchronize dependencies

3. Activate the virtual environment

🚀 Usage

1. Build Query

2. Retrieve Relevant Code Blocks

3. Rerank Retrieved Code Blocks

4. Build Prompts for Generator

5. Run Inference

6. Evaluate Results

📄 Citation

📧 Contact

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages