You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
An extensive and commented list of resources on Learned Sparse Retrieval (LSR). Most of the resources below refer to learned sparse representations for text retrieval.
From Neural Re-Ranking to Neural Ranking: Learning a Sparse Representation for Inverted Indexing
Hamed Zamani, Mostafa Dehghani, W. Bruce Croft, Erik Learned-Miller, Jaap Kamps
CIKM, 2018 π paper
Expansion via Prediction of Importance with Contextualization
Sean MacAvaney, Franco Maria Nardini, Raffaele Perego, Nicola Tonellotto, Nazli Goharian, Ophir Frieder
SIGIR, 2020 π paper
Context-Aware Term Weighting For First Stage Passage Retrieval
Zhuyun Dai, Jamie Callan
SIGIR, 2020 π paper
A Few Brief Notes on DeepImpact, COIL, and a Conceptual Framework for Information Retrieval Techniques
Jimmy Lin, Xueguang Ma
CoRR, 2021 π paper
Learning Passage Impacts for Inverted Indexes
Antonio Mallia, Omar Khattab, Torsten Suel, Nicola Tonellotto
SIGIR, 2021 π paper
SPLADE: Sparse Lexical and Expansion Model for First Stage Ranking
Thibault Formal, Benjamin Piwowarski, Stephane Clinchant
SIGIR, 2021 π paper
SPLADE v2: Sparse Lexical and Expansion Model for Information Retrieval
Thibault Formal, Carlos Lassance, Benjamin Piwowarski, Stephane Clinchant
CoRR, 2021 π paper
SPARTA: Efficient Open-Domain Question Answering via Sparse Transformer Matching Retrieval
Tiancheng Zhao, Xiaopeng Lu, Kyusong Lee
NAACL, 2021 π paper
TILDE: Term Independent Likelihood moDEl for Passage Re-ranking
Shengyao Zhuang, Guido Zuccon
SIGIR, 2021 π paper
From Distillation to Hard Negative Sampling: Making Sparse Neural IR Models More Effective
Thibault Formal, Carlos Lassance, Benjamin Piwowarski, Stephane Clinchant
SIGIR, 2022 π paper
Fast Passage Re-ranking with Contextualized Exact Term Matching and Efficient Passage Expansion
Shengyao Zhuang, Guido Zuccon
ReNeuIR at SIGIR, 2022 π paper
Learning a Sparse Representation Model for Neural CLIR
Suraj Nair, Eugene Yang, Dawn J Lawrie, James Mayfield, Douglas W. Oard
DESIRES, 2022 π paper
LexMAE: Lexicon-Bottlenecked Pretraining for Large-Scale Retrieval
Tao Shen, Xiubo Geng, Chongyang Tao, Can Xu, Xiaolong Huang, Binxing Jiao, Linjun Yang, Daxin Jiang
ICLR, 2023 π paper
A Unified Framework for Learned Sparse Retrieval
Thong Nguyen, Sean MacAvaney, Andrew Yates
ECIR, 2023 π paper
BLADE: Combining Vocabulary Pruning and Intermediate Pretraining for Scaleable Neural CLIR
Suraj Nair, Eugene Yang, Dawn Lawrie, James Mayfield, Douglas W. Oard
SIGIR, 2023 π paper
Exploring the Representation Power of SPLADE Models
Joel Mackenzie, Shengyao Zhuang, Guido Zuccon
ICTIR, 2023 π paper
Learning Sparse Lexical Representations Over Specified Vocabularies for Retrieval
Jeffrey M Dudek, Weize Kong, Cheng Li, Mingyang Zhang, Michael Bendersky
CIKM, 2023 π paper
Improved Learned Sparse Retrieval with Corpus-Specific Vocabularies
Puxuan Yu, Antonio Mallia, Matthias Petri
ECIR, 2024 π paper
Multimodal Learned Sparse Retrieval with Probabilistic Expansion Control
Thong Nguyen, Mariya Hendriksen, Andrew Yates, Maarten de Rijke
ECIR, 2024 π paper
DyVo: Dynamic Vocabularies for Learned Sparse Retrieval with Entities
Thong Nguyen, Shubham Chatterjee, Sean MacAvaney, Iain Mackie, Jeff Dalton, Andrew Yates
EMNLP, 2024 π paper
DiSCo: LLM Knowledge Distillation for Efficient Sparse Retrieval in Conversational Search
Simon Lupart, Mohammad Aliannejadi, Evangelos Kanoulas
SIGIR, 2025 π paper
Towards Competitive Search Relevance For Inference-Free Learned Sparse Retrievers
Zhichao Geng, Dongyu Ru, Yang Yang
CoRR, 2024 π paper
Mistral-SPLADE: LLMs for better Learned Sparse Retrieval
Meet Doshi, Vishwajeet Kumar, Rudra Murthy, Vignesh P, Jaydeep Sen
CoRR, 2024 π paper
An Alternative to FLOPS Regularization to Effectively Productionize SPLADE-doc
Aldo Porco, Dhruv Mehra, Igor Malioutov, Karthik Radhakrishnan, Moniba Keymanesh, Daniel Preotiuc-Pietro, Sean MacAvaney, Pengxiang Cheng
SIGIR, 2025 π paper
Effective Inference-Free Retrieval for Learned Sparse Representations
Franco Maria Nardini, Thong Nguyen, Cosimo Rulli, Rossano Venturini, Andrew Yates
SIGIR, 2025 π paper
Exploring $\ell_0$ Sparsification for Inference-free Sparse Retrievers
Xinjie Shen, Zhichao Geng, Yang Yang
SIGIR, 2025 π paper
Enhancing Lexicon-Based Text Embeddings with Large Language Models
Yibin Lei, Tao Shen, Yu Cao, Andrew Yates
ACL, 2025 π paper
Leveraging decoder architectures for learned sparse retrieval
Jingfen Qiao, Thong Nguyen, Evangelos Kanoulas, Andrew Yates
International Workshop on Knowledge-Enhanced Information Retrieval, 2025 π paper
Scaling sparse and dense retrieval in decoder-only LLMs
Hansi Zeng, Julian Killingback, Hamed Zamani
SIGIR, 2025 π paper
CSPLADE: Learned Sparse Retrieval with Causal Language Models
Zhichao Xu, Aosong Feng, Yijun Tian, Haibo Ding, Lin Lee Cheong
CoRR, 2025 π paper
On the Reproducibility of Learned Sparse Retrieval Adaptations for Long Documents
Emmanouil Georgios Lionis, Jia-Huei Ju
ECIR, 2025 π paper
Milco: Learned Sparse Retrieval Across Languages via a Multilingual Connector
Thong Nguyen, Yibin Lei, Jia-Huei Ju, Eugene Yang, Andrew Yates
ICLR, 2026 π paper
LACONIC: Dense-Level Effectiveness for Scalable Sparse Retrieval via a Two-Phase Training Curriculum
Zhichao Xu, Shengyao Zhuang, Crystina Zhang, Xueguang Ma, Yijun Tian, Maitrey Mehta, Jimmy Lin, Vivek Srikumar
SIGIR, 2026 π paper
Sparton: Fast and Memory-Efficient Triton Kernel for Learned Sparse Retrieval
Thong Nguyen, Cosimo Rulli, Franco Maria Nardini, Rossano Venturini, Andrew Yates
SIGIR, 2026 π paper
Self-Improving Sparse Retrieval Through Heuristic Representation Refinement and Representation-Focused Learning
Xiaojing Li, Bin Wang, Xiaochun Yang, Meng Luo
AAAI, 2026 paper
From Tokens to Concepts: Leveraging SAE for SPLADE
Yuxuan Zong, Mathias Vast, Basile Van Cooten, Laure Soulier, Benjamin Piwowarski
SIGIR, 2026 π paper
To Case or Not to Case: An Empirical Study in Learned Sparse Retrieval
Emmanouil Georgios Lionis, Jia-Huei Ju, Angelos Nalmpantis, Casper Thuis, Sean MacAvaney, Andrew Yates
ECIR, 2026 paper
Indexing LSR
Efficient and Robust Approximate Nearest Neighbor Search Using Hierarchical Navigable Small World Graphs
Yury A. Malkov, Dmitry A. Yashunin
IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020
π paper
Efficiency Implications of Term Weighting for Passage Retrieval
Joel Mackenzie, Zhuyun Dai, Luke Gallagher, Jamie Callan
SIGIR, 2020
π paper
Wacky Weights in Learned Sparse Representations and the Revenge of Score-at-a-Time Query Evaluation
Joel Mackenzie, Andrew Trotman, Jimmy Lin
CoRR, 2021
π paper
Insights into the Efficiency of Open-Source Score-at-a-Time Search Engines: A Reproducibility Study
Katelyn Harlan, Andrew Trotman, Veronica Liesaputra
SIGIR, 2026
π paper
Accelerating Learned Sparse Indexes Via Term Impact Decomposition
Joel Mackenzie, Antonio Mallia, Alistair Moffat, Matthias Petri
EMNLP, 2022
π paper | π οΈ code
An Efficiency Study for SPLADE Models
Carlos Lassance, Stephane Clinchant
SIGIR, 2022
π paper
Faster Learned Sparse Retrieval with Guided Traversal
Antonio Mallia, Joel Mackenzie, Torsten Suel, Nicola Tonellotto
SIGIR, 2022
π paper
IOQP: A simple Impact-Ordered Query Processor written in Rust
Joel Mackenzie, Matthias Petri, Luke Gallagher
DESIRES, 2022
π paper | π οΈ code
A Static Pruning Study on Sparse Neural Retrievers
Carlos Lassance, Simon Lupart, Herve Dejean, Stephane Clinchant, Nicola Tonellotto
SIGIR, 2023
π paper
An Approximate Algorithm for Maximum Inner Product Search over Streaming Sparse Vectors
Sebastian Bruch, Franco Maria Nardini, Amir Ingber, Edo Liberty
ACM Transactions on Information Systems, 2024
π paper
Efficient Document-at-a-time and Score-at-a-time Query Evaluation for Learned Sparse Representations
Joel Mackenzie, Andrew Trotman, Jimmy Lin
ACM Transactions on Information Systems, 2023
π paper
Optimizing Guided Traversal for Fast Learned Sparse Retrieval
Yifan Qiao, Yingrui Yang, Haixin Lin, Tao Yang
WWW, 2023
π paper | π οΈ code
Representation Sparsification with Hybrid Thresholding for Fast SPLADE-based Document Retrieval
Yifan Qiao, Yingrui Yang, Shanxiu He, Tao Yang
SIGIR, 2023
π paper | π οΈ code
Results of the Big ANN: NeurIPS'23 competition
Harsha Vardhan Simhadri, Martin AumΓΌller, Amir Ingber, Matthijs Douze, George Williams, Magdalen Dobson Manohar, Dmitry Baranchuk, Edo Liberty, Frank Liu, Ben Landrum, Mazin Karjikar, Laxman Dhulipala, Meng Chen, Yue Chen, Rui Ma, Kai Zhang, Yuzheng Cai, Jiayang Shi, Yizhuo Chen, Weiguo Zheng, Zihao Wan, Jie Yin, Ben Huang
CoRR, 2023
π paper | π οΈ code
Bridging Dense and Sparse Maximum Inner Product Search
Sebastian Bruch, Franco Maria Nardini, Amir Ingber, Edo Liberty
ACM Transactions on Information Systems, 2024
π paper
Efficient Inverted Indexes for Approximate Retrieval over Learned Sparse Representations
Sebastian Bruch, Franco Maria Nardini, Cosimo Rulli, Rossano Venturini
SIGIR, 2024
π paper | π οΈ code
Faster Learned Sparse Retrieval with Block-Max Pruning
Antonio Mallia, Torsten Suel, Nicola Tonellotto
SIGIR, 2024
π paper | π οΈ code
Cluster-based Partial Dense Retrieval Fused with Sparse Text Retrieval
Yingrui Yang, Parker Carlson, Shanxiu He, Yifan Qiao, Tao Yang
SIGIR, 2024
π paper
Pairing Clustered Inverted Indexes with ΞΊ-NN Graphs for Fast Approximate Retrieval over Learned Sparse Representations
Sebastian Bruch, Franco Maria Nardini, Cosimo Rulli, Rossano Venturini
CIKM, 2024
π paper | π οΈ code
Threshold-driven Pruning with Segmented Maximum Term Weights for Approximate Cluster-based Sparse Retrieval
Yifan Qiao, Parker Carlson, Shanxiu He, Yingrui Yang, Tao Yang
EMNLP, 2024
π paper
Foundations of Vector Retrieval
Sebastian Bruch
Springer, 2024
π book
Dynamic Superblock Pruning for Fast Learned Sparse Retrieval
Parker Carlson, Wentai Xie, Shanxiu He, Tao Yang
SIGIR, 2025
π paper | π οΈ code
Efficient Sketching and Nearest Neighbor Search Algorithms for Sparse Vector Sets
Sebastian Bruch, Franco Maria Nardini, Cosimo Rulli, Rossano Venturini
CoRR, 2025
π paper | π οΈ code
Investigating the Scalability of Approximate Sparse Retrieval Algorithms to Massive Datasets
Sebastian Bruch, Franco Maria Nardini, Cosimo Rulli, Rossano Venturini, Leonardo Venuta
ECIR, 2025
π paper | π οΈ code
SINDI: an Efficient Index for Approximate Maximum Inner Product Search on Sparse Vectors
Ruoxuan Li, Xiaoyao Zhong, Jiabao Jin, Peng Cheng, Wangze Ni, Lei Chen, Zhitao Shen, Wei Jia, Xiangyu Wang, Xuemin Lin, Heng Tao Shen, Jingkuan Song
CoRR, 2025
π paper | π οΈ code
kANNolo: Sweet and Smooth Approximate k-Nearest Neighbors Search
Leonardo Delfino, Domenico Erriquez, Silvio Martinico, Franco Maria Nardini, Cosimo Rulli, Rossano Venturini
ECIR, 2025
π paper | π οΈ code
Breaking the Curse of Dimensionality: On the Stability of Modern Vector Retrieval
Vihan Lakshman, Blaise Munyampirwa, Julian Shun, Benjamin Coleman
CoRR, 2025
π paper
Efficiency Optimizations for Superblock-based Sparse Retrieval
Parker Carlson, Wentai Xie, Rohil Shah, Tao Yang
CoRR, 2026
π paper
Evaluating the Efficiency and Effectiveness of Learned Sparse Retrieval with the lsr_benchmark
Maik FrΓΆbe, Ferdinand Schlatt, Cosimo Rulli, Tim Hagen, Jan Heinrich Merker, Gijs Hendriksen, Carlos Lassance, Franco Maria Nardini, Rossano Venturini, Martin Potthast
ECIR, 2026
π paper | π οΈ code
Forward Index Compression for Learned Sparse Retrieval
Sebastian Bruch, Martino Fontana, Franco Maria Nardini, Cosimo Rulli, Rossano Venturini
ECIR, 2026
π paper | π οΈ code
Fast, Compact, Immediate-Access Indexing for Learned Sparse Retrieval Systems
Billy Rule, Joel Mackenzie
ECIR, 2026
π paper
lsr-benchmark Framework for the evaluation of the learned sparse retrieval paradigm to contrast efficiency and effectiveness across diverse retrieval scenarios
kANNolo Library for fast dense/sparse learned retrieval with graph-based indexes.
Seismic State-of-the-Art library for fast sparse learned retrieval and indexing with focus on efficient inverted-index structures built on modern research
Vectorium A library for storing and accessing datasets of dense and sparse vectors, with efficient support for brute-force search and the core operations required by vector indexing data structures.
BMP (Block Max Pruning) Extremely efficient learned sparse retrieval framework with an in-memory block-based inverted index.
Sentence Transformer Framework for generating sentence embeddings and sparse encoders for semantic and sparse retrieval in NLP applications.
Pyserini IR research toolkit built on Lucene that supports learned sparse retrieval models.
NMSLib Non-Metric Space Library (NMSLIB): An efficient similarity search library and a toolkit for evaluation of k-NN methods for generic non-metric spaces.
OpenSearch Open-source search and analytics engine that supports scalable sparse, dense, and hybrid neural retrieval via plugins and vector extensions.
Apache Lucene High-performance Java search library providing inverted indexes and scoring infrastructure that underpins many learned sparse retrieval systems.
Qdrant Open-source vector database supporting dense, sparse, and hybrid retrieval with native sparse vector indexing based on an inverted index for exact high-dimensional sparse search.
FlashRAG A Python Toolkit for Efficient RAG Research.
PyTerrier A Python framework for performing information retrieval experiments (supporting PISA and BMP).
PISA A modular C++ inverted index and query processing framework supporting many compression codecs and dynamic pruning algorithms.