Repository for Research Paper LightRetriever: A LLM-based Text Retrieval Architecture with Extremely Faster Query Inference.

LightRetriever targets at extreme query inference speed for LLM-based text retrieval, reducing the workload of query encoding to no more than an embedding lookup.
Please set up the following packages for evaluation:
- Install Faiss by following their guidelines for dense evaluation support.
- Download Anserini jar to the project folder for sparse evaluation support.
wget https://repo1.maven.org/maven2/io/anserini/anserini/0.25.0/anserini-0.25.0-fatjar.jar
- JAVA environment setup. Please install openjdk-17-jdk:
sudo apt-get update
sudo apt-get install openjdk-17-jdk
You can easily set up the environment by cloning this repo, and runing the following command.
pip install -e .
Multiple fine-tuned retriever LoRA weights are released.
Model | Description |
---|---|
lightretriever/lightretriever-llama3.1-8b-mrl | Fine-tuned Llama3.1-8b retriever with MRL, supporting flexible Top-k dimension/sparsity controls. |
lightretriever/lightretriever-llama3.1-8b | Fine-tuned Llama3.1-8b retriever, supporting symmetric dense, asymmetric dense & sparse retrieval. |
lightretriever/lightretriever-llama3.2-3b | Fine-tuned Llama3.2-3b retriever, supporting symmetric dense, asymmetric dense & sparse retrieval. |
lightretriever/lightretriever-llama3.2-1b | Fine-tuned Llama3.2-1b retriever, supporting symmetric dense, asymmetric dense & sparse retrieval. |
lightretriever/lightretriever-qwen2.5-7b | Fine-tuned Qwen2.5-7B retriever, supporting symmetric dense, asymmetric dense & sparse retrieval. |
lightretriever/lightretriever-qwen2.5-3b | Fine-tuned Qwen2.5-3B retriever, supporting symmetric dense, asymmetric dense & sparse retrieval. |
lightretriever/lightretriever-qwen2.5-1.5b | Fine-tuned Qwen2.5-1.5B retriever, supporting symmetric dense, asymmetric dense & sparse retrieval. |
We follow the data preparation processes of tDRO. All training sets with proper Hard Negative Mining and Deduplication (with test sets) are relasesd as follows.
Dataset | Description |
---|---|
lightretriever/lightretriever-finetune-data | All training sets. |
The scripts below shows inference examples.
Script: cache_emb_bag.ipynb
LightRetriever asymmetric dense retrieval needs to cache an EmbeddingBag before serving. Please refer to the above reference script.
Script: scripts/asymmetric_dense_infer.ipynb
An example to show how to load and encode with LightRetriever's dense query EmbeddingBag & document LLM encoder.
Script: scripts/asymmetric_sparse_infer.ipynb
An example to show how to encode with LightRetriever's sparse query token counts & document LLM encoder.
Script: scripts/finetune_example.sh
A full fine-tuned script to reproduce lightretriever/lightretriever-llama3.1-8b is released. All retriever fine-tuning with different LLM backbones shares the same training hyper-parameters as defined in this script.
Please clone all necessary training sets to a local folder, e.g. data/train
. Then set the correct path to preprocessed dictionary, such as --preprocessed_dir data/train
. All needed training sets and the corresponding sampling weights are defined in config/data/exp-m.json
.
Please refer to the reference script scripts/finetune_example.sh
to finetune LightRetriver.
Please refer to eval/README.md
for more details.
If you encounter any bugs or questions, please feel free to email me or open an issue.
Contacts: Guangyuan Ma (maguangyuan@iie.ac.cn)
If you are interested in our work, please consider citing our paper.
@article{@article{DBLP:journals/corr/abs-2505-12260,
author = {Guangyuan Ma and
Yongliang Ma and
Xuanrui Gou and
Zhenpeng Su and
Ming Zhou and
Songlin Hu},
title = {LightRetriever: {A} LLM-based Text Retrieval Architecture with Extremely
Faster Query Inference},
journal = {CoRR},
volume = {abs/2505.12260},
year = {2025},
url = {https://doi.org/10.48550/arXiv.2505.12260},
doi = {10.48550/ARXIV.2505.12260},
eprinttype = {arXiv},
eprint = {2505.12260},
timestamp = {Mon, 23 Jun 2025 13:59:12 +0200},
biburl = {https://dblp.org/rec/journals/corr/abs-2505-12260.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
}