Awesome Learned Sparse Retrieval

An extensive and commented list of resources on Learned Sparse Retrieval (LSR). Most of the resources below refer to learned sparse representations for text retrieval.

LSR Models

From Neural Re-Ranking to Neural Ranking: Learning a Sparse Representation for Inverted Indexing
Hamed Zamani, Mostafa Dehghani, W. Bruce Croft, Erik Learned-Miller, Jaap Kamps
CIKM, 2018
📄 paper
Expansion via Prediction of Importance with Contextualization
Sean MacAvaney, Franco Maria Nardini, Raffaele Perego, Nicola Tonellotto, Nazli Goharian, Ophir Frieder
SIGIR, 2020
📄 paper
Context-Aware Term Weighting For First Stage Passage Retrieval
Zhuyun Dai, Jamie Callan
SIGIR, 2020
📄 paper
A Few Brief Notes on DeepImpact, COIL, and a Conceptual Framework for Information Retrieval Techniques
Jimmy Lin, Xueguang Ma
CoRR, 2021
📄 paper
Learning Passage Impacts for Inverted Indexes
Antonio Mallia, Omar Khattab, Torsten Suel, Nicola Tonellotto
SIGIR, 2021
📄 paper
SPLADE: Sparse Lexical and Expansion Model for First Stage Ranking
Thibault Formal, Benjamin Piwowarski, Stephane Clinchant
SIGIR, 2021
📄 paper
SPLADE v2: Sparse Lexical and Expansion Model for Information Retrieval
Thibault Formal, Carlos Lassance, Benjamin Piwowarski, Stephane Clinchant
CoRR, 2021
📄 paper
SPARTA: Efficient Open-Domain Question Answering via Sparse Transformer Matching Retrieval
Tiancheng Zhao, Xiaopeng Lu, Kyusong Lee
NAACL, 2021
📄 paper
TILDE: Term Independent Likelihood moDEl for Passage Re-ranking
Shengyao Zhuang, Guido Zuccon
SIGIR, 2021
📄 paper
From Distillation to Hard Negative Sampling: Making Sparse Neural IR Models More Effective
Thibault Formal, Carlos Lassance, Benjamin Piwowarski, Stephane Clinchant
SIGIR, 2022
📄 paper
Fast Passage Re-ranking with Contextualized Exact Term Matching and Efficient Passage Expansion
Shengyao Zhuang, Guido Zuccon
ReNeuIR at SIGIR, 2022
📄 paper
An Efficiency Study for SPLADE Models
Carlos Lassance, Stéphane Clinchant
SIGIR, 2022
📄 paper
Learning a Sparse Representation Model for Neural CLIR
Suraj Nair, Eugene Yang, Dawn J Lawrie, James Mayfield, Douglas W. Oard
DESIRES, 2022
📄 paper
LexMAE: Lexicon-Bottlenecked Pretraining for Large-Scale Retrieval
Tao Shen, Xiubo Geng, Chongyang Tao, Can Xu, Xiaolong Huang, Binxing Jiao, Linjun Yang, Daxin Jiang
ICLR, 2023
📄 paper
A Unified Framework for Learned Sparse Retrieval
Thong Nguyen, Sean MacAvaney, Andrew Yates
ECIR, 2023
📄 paper
BLADE: Combining Vocabulary Pruning and Intermediate Pretraining for Scaleable Neural CLIR
Suraj Nair, Eugene Yang, Dawn Lawrie, James Mayfield, Douglas W. Oard
SIGIR, 2023
📄 paper
Exploring the Representation Power of SPLADE Models
Joel Mackenzie, Shengyao Zhuang, Guido Zuccon
ICTIR, 2023
📄 paper
Learning Sparse Lexical Representations Over Specified Vocabularies for Retrieval
Jeffrey M Dudek, Weize Kong, Cheng Li, Mingyang Zhang, Michael Bendersky
CIKM, 2023
📄 paper
Improved Learned Sparse Retrieval with Corpus-Specific Vocabularies
Puxuan Yu, Antonio Mallia, Matthias Petri
ECIR, 2024
📄 paper
Two-Step SPLADE: Simple, Efficient and Effective Approximation of SPLADE
Carlos Lassance, Hervé Déjean, Stephane Clinchant, Nicola Tonellotto
ECIR, 2024
📄 paper
SPLATE: Sparse Late Interaction Retrieval
Thibault Formal, Stephane Clinchant, Hervé Déjean, Carlos Lassance
SIGIR, 2024
📄 paper
Multimodal Learned Sparse Retrieval with Probabilistic Expansion Control
Thong Nguyen, Mariya Hendriksen, Andrew Yates, Maarten de Rijke
ECIR, 2024
📄 paper
DyVo: Dynamic Vocabularies for Learned Sparse Retrieval with Entities
Thong Nguyen, Shubham Chatterjee, Sean MacAvaney, Iain Mackie, Jeff Dalton, Andrew Yates
EMNLP, 2024
📄 paper
SPLADE-v3: New baselines for SPLADE
Carlos Lassance, Hervé Déjean, Thibault Formal, Stephane Clinchant
CoRR, 2024
📄 paper
DiSCo: LLM Knowledge Distillation for Efficient Sparse Retrieval in Conversational Search
Simon Lupart, Mohammad Aliannejadi, Evangelos Kanoulas
SIGIR, 2025
📄 paper
Towards Competitive Search Relevance For Inference-Free Learned Sparse Retrievers
Zhichao Geng, Dongyu Ru, Yang Yang
CoRR, 2024
📄 paper
Mistral-SPLADE: LLMs for better Learned Sparse Retrieval
Meet Doshi, Vishwajeet Kumar, Rudra Murthy, Vignesh P, Jaydeep Sen
CoRR, 2024
📄 paper
An Alternative to FLOPS Regularization to Effectively Productionize SPLADE-doc
Aldo Porco, Dhruv Mehra, Igor Malioutov, Karthik Radhakrishnan, Moniba Keymanesh, Daniel Preotiuc-Pietro, Sean MacAvaney, Pengxiang Cheng
SIGIR, 2025
📄 paper
Effective Inference-Free Retrieval for Learned Sparse Representations
Franco Maria Nardini, Thong Nguyen, Cosimo Rulli, Rossano Venturini, Andrew Yates
SIGIR, 2025
📄 paper
Exploring $\ell_0$ Sparsification for Inference-free Sparse Retrievers
Xinjie Shen, Zhichao Geng, Yang Yang
SIGIR, 2025
📄 paper
Enhancing Lexicon-Based Text Embeddings with Large Language Models
Yibin Lei, Tao Shen, Yu Cao, Andrew Yates
ACL, 2025
📄 paper
Leveraging decoder architectures for learned sparse retrieval
Jingfen Qiao, Thong Nguyen, Evangelos Kanoulas, Andrew Yates
International Workshop on Knowledge-Enhanced Information Retrieval, 2025
📄 paper
Scaling sparse and dense retrieval in decoder-only LLMs
Hansi Zeng, Julian Killingback, Hamed Zamani
SIGIR, 2025
📄 paper
CSPLADE: Learned Sparse Retrieval with Causal Language Models
Zhichao Xu, Aosong Feng, Yijun Tian, Haibo Ding, Lin Lee Cheong
CoRR, 2025
📄 paper
On the Reproducibility of Learned Sparse Retrieval Adaptations for Long Documents
Emmanouil Georgios Lionis, Jia-Huei Ju
ECIR, 2025
📄 paper
Learning Retrieval Models with Sparse Autoencoders
Thibault Formal, Maxime Louis, Hervé Déjean, Stéphane Clinchant
ICLR, 2026
📄 paper
Milco: Learned Sparse Retrieval Across Languages via a Multilingual Connector
Thong Nguyen, Yibin Lei, Jia-Huei Ju, Eugene Yang, Andrew Yates
ICLR, 2026
📄 paper
LACONIC: Dense-Level Effectiveness for Scalable Sparse Retrieval via a Two-Phase Training Curriculum
Zhichao Xu, Shengyao Zhuang, Crystina Zhang, Xueguang Ma, Yijun Tian, Maitrey Mehta, Jimmy Lin, Vivek Srikumar
SIGIR, 2026
📄 paper
Sparton: Fast and Memory-Efficient Triton Kernel for Learned Sparse Retrieval
Thong Nguyen, Cosimo Rulli, Franco Maria Nardini, Rossano Venturini, Andrew Yates
SIGIR, 2026
📄 paper
Self-Improving Sparse Retrieval Through Heuristic Representation Refinement and Representation-Focused Learning
Xiaojing Li, Bin Wang, Xiaochun Yang, Meng Luo
AAAI, 2026
paper
From Tokens to Concepts: Leveraging SAE for SPLADE
Yuxuan Zong, Mathias Vast, Basile Van Cooten, Laure Soulier, Benjamin Piwowarski
SIGIR, 2026
📄 paper
To Case or Not to Case: An Empirical Study in Learned Sparse Retrieval
Emmanouil Georgios Lionis, Jia-Huei Ju, Angelos Nalmpantis, Casper Thuis, Sean MacAvaney, Andrew Yates
ECIR, 2026
paper

Indexing LSR

Efficient and Robust Approximate Nearest Neighbor Search Using Hierarchical Navigable Small World Graphs
Yury A. Malkov, Dmitry A. Yashunin
IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020
📄 paper
Efficiency Implications of Term Weighting for Passage Retrieval
Joel Mackenzie, Zhuyun Dai, Luke Gallagher, Jamie Callan
SIGIR, 2020
📄 paper
Wacky Weights in Learned Sparse Representations and the Revenge of Score-at-a-Time Query Evaluation
Joel Mackenzie, Andrew Trotman, Jimmy Lin
CoRR, 2021
📄 paper
Insights into the Efficiency of Open-Source Score-at-a-Time Search Engines: A Reproducibility Study
Katelyn Harlan, Andrew Trotman, Veronica Liesaputra
SIGIR, 2026
📄 paper
Accelerating Learned Sparse Indexes Via Term Impact Decomposition
Joel Mackenzie, Antonio Mallia, Alistair Moffat, Matthias Petri
EMNLP, 2022
📄 paper | 🛠️ code
An Efficiency Study for SPLADE Models
Carlos Lassance, Stephane Clinchant
SIGIR, 2022
📄 paper
Faster Learned Sparse Retrieval with Guided Traversal
Antonio Mallia, Joel Mackenzie, Torsten Suel, Nicola Tonellotto
SIGIR, 2022
📄 paper
IOQP: A simple Impact-Ordered Query Processor written in Rust
Joel Mackenzie, Matthias Petri, Luke Gallagher
DESIRES, 2022
📄 paper | 🛠️ code
A Static Pruning Study on Sparse Neural Retrievers
Carlos Lassance, Simon Lupart, Herve Dejean, Stephane Clinchant, Nicola Tonellotto
SIGIR, 2023
📄 paper
An Approximate Algorithm for Maximum Inner Product Search over Streaming Sparse Vectors
Sebastian Bruch, Franco Maria Nardini, Amir Ingber, Edo Liberty
ACM Transactions on Information Systems, 2024
📄 paper
Efficient Document-at-a-time and Score-at-a-time Query Evaluation for Learned Sparse Representations
Joel Mackenzie, Andrew Trotman, Jimmy Lin
ACM Transactions on Information Systems, 2023
📄 paper
Optimizing Guided Traversal for Fast Learned Sparse Retrieval
Yifan Qiao, Yingrui Yang, Haixin Lin, Tao Yang
WWW, 2023
📄 paper | 🛠️ code
Representation Sparsification with Hybrid Thresholding for Fast SPLADE-based Document Retrieval
Yifan Qiao, Yingrui Yang, Shanxiu He, Tao Yang
SIGIR, 2023
📄 paper | 🛠️ code
Results of the Big ANN: NeurIPS'23 competition
Harsha Vardhan Simhadri, Martin Aumüller, Amir Ingber, Matthijs Douze, George Williams, Magdalen Dobson Manohar, Dmitry Baranchuk, Edo Liberty, Frank Liu, Ben Landrum, Mazin Karjikar, Laxman Dhulipala, Meng Chen, Yue Chen, Rui Ma, Kai Zhang, Yuzheng Cai, Jiayang Shi, Yizhuo Chen, Weiguo Zheng, Zihao Wan, Jie Yin, Ben Huang
CoRR, 2023
📄 paper | 🛠️ code
Bridging Dense and Sparse Maximum Inner Product Search
Sebastian Bruch, Franco Maria Nardini, Amir Ingber, Edo Liberty
ACM Transactions on Information Systems, 2024
📄 paper
Efficient Inverted Indexes for Approximate Retrieval over Learned Sparse Representations
Sebastian Bruch, Franco Maria Nardini, Cosimo Rulli, Rossano Venturini
SIGIR, 2024
📄 paper | 🛠️ code
Faster Learned Sparse Retrieval with Block-Max Pruning
Antonio Mallia, Torsten Suel, Nicola Tonellotto
SIGIR, 2024
📄 paper | 🛠️ code
Cluster-based Partial Dense Retrieval Fused with Sparse Text Retrieval
Yingrui Yang, Parker Carlson, Shanxiu He, Yifan Qiao, Tao Yang
SIGIR, 2024
📄 paper
Pairing Clustered Inverted Indexes with κ-NN Graphs for Fast Approximate Retrieval over Learned Sparse Representations
Sebastian Bruch, Franco Maria Nardini, Cosimo Rulli, Rossano Venturini
CIKM, 2024
📄 paper | 🛠️ code
Threshold-driven Pruning with Segmented Maximum Term Weights for Approximate Cluster-based Sparse Retrieval
Yifan Qiao, Parker Carlson, Shanxiu He, Yingrui Yang, Tao Yang
EMNLP, 2024
📄 paper
Foundations of Vector Retrieval
Sebastian Bruch
Springer, 2024
📄 book
Dynamic Superblock Pruning for Fast Learned Sparse Retrieval
Parker Carlson, Wentai Xie, Shanxiu He, Tao Yang
SIGIR, 2025
📄 paper | 🛠️ code
Efficient Sketching and Nearest Neighbor Search Algorithms for Sparse Vector Sets
Sebastian Bruch, Franco Maria Nardini, Cosimo Rulli, Rossano Venturini
CoRR, 2025
📄 paper | 🛠️ code
Investigating the Scalability of Approximate Sparse Retrieval Algorithms to Massive Datasets
Sebastian Bruch, Franco Maria Nardini, Cosimo Rulli, Rossano Venturini, Leonardo Venuta
ECIR, 2025
📄 paper | 🛠️ code
SINDI: an Efficient Index for Approximate Maximum Inner Product Search on Sparse Vectors
Ruoxuan Li, Xiaoyao Zhong, Jiabao Jin, Peng Cheng, Wangze Ni, Lei Chen, Zhitao Shen, Wei Jia, Xiangyu Wang, Xuemin Lin, Heng Tao Shen, Jingkuan Song
CoRR, 2025
📄 paper | 🛠️ code
kANNolo: Sweet and Smooth Approximate k-Nearest Neighbors Search
Leonardo Delfino, Domenico Erriquez, Silvio Martinico, Franco Maria Nardini, Cosimo Rulli, Rossano Venturini
ECIR, 2025
📄 paper | 🛠️ code
Breaking the Curse of Dimensionality: On the Stability of Modern Vector Retrieval
Vihan Lakshman, Blaise Munyampirwa, Julian Shun, Benjamin Coleman
CoRR, 2025
📄 paper
Efficiency Optimizations for Superblock-based Sparse Retrieval
Parker Carlson, Wentai Xie, Rohil Shah, Tao Yang
CoRR, 2026
📄 paper
Evaluating the Efficiency and Effectiveness of Learned Sparse Retrieval with the lsr_benchmark
Maik Fröbe, Ferdinand Schlatt, Cosimo Rulli, Tim Hagen, Jan Heinrich Merker, Gijs Hendriksen, Carlos Lassance, Franco Maria Nardini, Rossano Venturini, Martin Potthast
ECIR, 2026
📄 paper | 🛠️ code
Forward Index Compression for Learned Sparse Retrieval
Sebastian Bruch, Martino Fontana, Franco Maria Nardini, Cosimo Rulli, Rossano Venturini
ECIR, 2026
📄 paper | 🛠️ code
Fast, Compact, Immediate-Access Indexing for Learned Sparse Retrieval Systems
Billy Rule, Joel Mackenzie
ECIR, 2026
📄 paper

Tutorials

Neural Lexical Search with Learned Sparse Retrieval: SIGIR 2024, 2025, and ECIR 2026
Practical, Efficient, In-Memory Inverted Indexes: SIGIR 2025 and ECIR 2026

Software Libraries

lsr-benchmark
Framework for the evaluation of the learned sparse retrieval paradigm to contrast efficiency and effectiveness across diverse retrieval scenarios
kANNolo
Library for fast dense/sparse learned retrieval with graph-based indexes.
Seismic
State-of-the-Art library for fast sparse learned retrieval and indexing with focus on efficient inverted-index structures built on modern research
Vectorium
A library for storing and accessing datasets of dense and sparse vectors, with efficient support for brute-force search and the core operations required by vector indexing data structures.
BMP (Block Max Pruning)
Extremely efficient learned sparse retrieval framework with an in-memory block-based inverted index.
Sentence Transformer
Framework for generating sentence embeddings and sparse encoders for semantic and sparse retrieval in NLP applications.
Pyserini
IR research toolkit built on Lucene that supports learned sparse retrieval models.
NMSLib
Non-Metric Space Library (NMSLIB): An efficient similarity search library and a toolkit for evaluation of k-NN methods for generic non-metric spaces.
OpenSearch
Open-source search and analytics engine that supports scalable sparse, dense, and hybrid neural retrieval via plugins and vector extensions.
Apache Lucene
High-performance Java search library providing inverted indexes and scoring infrastructure that underpins many learned sparse retrieval systems.
Qdrant
Open-source vector database supporting dense, sparse, and hybrid retrieval with native sparse vector indexing based on an inverted index for exact high-dimensional sparse search.
FlashRAG
A Python Toolkit for Efficient RAG Research.
PyTerrier
A Python framework for performing information retrieval experiments (supporting PISA and BMP).
PISA
A modular C++ inverted index and query processing framework supporting many compression codecs and dynamic pruning algorithms.

Datasets and Encodings

`MS MARCO v1`

Documents: 8,841,823
Queries [dev.small]: 6,980
Reference Metric: MRR@10

Encoding	Link	Avg non-zero (docs)	Avg non-zero (queries)	MRR@10
`splade-cocondenser`	link	`119`	`43`	`38.3`
`efficient-splade`	link	`181`	`6`	`38.8`
`uniCOIL-T5`	link	`68`	`6`	`35.2`
`splade-v3`	link	`168`	`24`	`40.3`
`li-lsr-big`	link	`387`	`6`	`38.8`
`laconic-1B`	link	`511`	`100`	`37.2`

`MS MARCO v2`

Documents: 138,363,364
Queries [dev1.small]: 3,903
Reference Metric: MRR@10

Encoding	Link	Avg non-zero (docs)	Avg non-zero (queries)	MRR@10
`splade-cocondenser`	link	`127`	`44`	`10.88`

`NQ`

Documents: 2,680,893
Queries: 3,452
Reference Metric: NDCG@10

Encoding	Link	Avg non-zero (docs)	Avg non-zero (queries)	NDCG@10
`splade-cocondenser`	link	`153`	`51`	`53.9`

`LoTTE-pooled`

Documents: 2,428,854
Queries [dev/search]: 2,931
Reference Metric: Success@5

Encoding	Link	Avg non-zero (docs)	Avg non-zero (queries)	Success@5
`splade-cocondenser`	`N/A`	`N/A`	`N/A`	`69.0`
`li-lsr-big`	link	`469`	`9`	`65.7`

`Quora`

Documents: 522,931
Queries [test] : 10,000
Reference Metric: nDCG@10

Encoding	Link	Avg non-zero (docs)	Avg non-zero (queries)	nDCG@10
`splade-v3`	link	`40`	`36`	`81.4`

Other Collections

Check Anserini's suite of pre-built indexes

List Maintainers (alphabetical order)

Franco Maria Nardini (ISTI-CNR, Pisa, Italy)
Cosimo Rulli (ISTI-CNR, Pisa, Italy)
Rossano Venturini (University of Pisa, Italy)

Other Contributors

Joel Mackenzie (The University of Queensland, Australia)

Name		Name	Last commit message	Last commit date
Latest commit History 95 Commits
images		images
LICENSE		LICENSE
README.md		README.md
biblio.bib		biblio.bib

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Awesome Learned Sparse Retrieval

Contents

LSR Models

Indexing LSR

Tutorials

Software Libraries

Datasets and Encodings

`MS MARCO v1`

`MS MARCO v2`

`NQ`

`LoTTE-pooled`

`Quora`

Other Collections

List Maintainers (alphabetical order)

Other Contributors

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Awesome Learned Sparse Retrieval

Contents

LSR Models

Indexing LSR

Tutorials

Software Libraries

Datasets and Encodings

MS MARCO v1

MS MARCO v2

NQ

LoTTE-pooled

Quora

Other Collections

List Maintainers (alphabetical order)

Other Contributors

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`MS MARCO v1`

`MS MARCO v2`

`NQ`

`LoTTE-pooled`

`Quora`

Packages