Reproducible CoREB v202603 public-snapshot retrieval SOTA with C2LLM-7B
dense retrieval under the official query-weighted nDCG@10 protocol.
This is a retrieval SOTA claim, not a full reranker SOTA claim.
Clean frozen test/public-snapshot result:
| Split | Overall nDCG@10 | Recall@10 | MAP@10 |
|---|---|---|---|
release_v2603 |
0.633174 |
0.821668 |
0.561064 |
Per-task nDCG@10:
| Task | nDCG@10 |
|---|---|
text2code |
0.444714 |
code2code |
0.657871 |
code2text |
0.803820 |
The official CoREB project page reports the best v202603 public-snapshot
retrieval overall as 0.624 for GemEmb-2. This run obtains 0.633174 with an
open C2LLM-7B dense retriever.
- Dataset:
hq-bench/coreb - Tuning split:
release_v2602 - Frozen test/public snapshot:
release_v2603 - Primary metric: query-count-weighted
nDCG@10 - Code2Code anchor exclusion: enabled, matching the official CoREB runner
For Code2Code, each query has an anchor code item in the shared corpus. The
official runner removes that anchor before computing metrics; this repository
does the same through TaskData.exclude_doc_ids.
python -m venv .venv
. .venv/bin/activate
pip install -e ".[gpu,dev]"
PYTHONPATH=compat:src python scripts/run_dense_retrieval.py \
--model /path/to/codefuse-ai_C2LLM-7B \
--split release_v2603 \
--tasks all \
--top-k 128 \
--eval-k 10 \
--batch-size 8 \
--max-length 2048 \
--dtype bf16 \
--device cuda \
--local-files-only \
--instruction-mode none \
--output reports/dense_c2llm7b_release_v2603_none_top128_official.jsonThe exact verified report is stored at
reports/dense_c2llm7b_release_v2603_none_top128_official.json.
The best post-test validation experiment on release_v2602 reaches
0.643800 overall nDCG@10 using dense RRF for text2code, Qwen3/dense fusion
for code2code, and C2LLM dense retrieval for code2text.
This validation number is not used as a clean release_v2603 test claim.
- CoREB project page: official benchmark page and public-snapshot retrieval leaderboard.
- CoREB GitHub repository: official benchmark code, evaluation runner, and protocol details.
- CoREB Hugging Face dataset: official benchmark data releases.
- C2LLM-7B model: dense retriever backbone used for the submitted run.