Code coming soon!
This repository contains the source code for the paper:
Efficient Constant-Space Multi-Vector Retrieval
by Sean MacAvaney, Antonio Mallia, and Nicola Tonellotto, published at ECIR 2025.
π Read the paper (PDF)
π ConstBERT received the Best Short Paper Honourable Mention at ECIR 2025.
ConstBERT (Constant-Space BERT) is a multi-vector retrieval model designed for efficient and effective passage retrieval. It modifies the ColBERT architecture by encoding documents into a fixed number of learned embeddings, significantly reducing index size and improving storage and OS paging efficiency β all while retaining high retrieval effectiveness.
- Fixed-size document representation (e.g., 32 vectors per document)
- Late interaction (MaxSim) for scoring
- End-to-end training of a pooling mechanism
- Comparable performance to ColBERT on MSMARCO and BEIR
- Efficient indexing and storage
The pretrained model is available on Hugging Face:
π https://huggingface.co/pinecone/ConstBERT
from transformers import AutoModel
import numpy as np
def max_sim(q: np.ndarray, d: np.ndarray) -> float:
assert q.ndim == 2 and d.ndim == 2
scores = np.dot(d, q.T)
return float(np.sum(np.max(scores, axis=0)))
model = AutoModel.from_pretrained("pinecone/ConstBERT", trust_remote_code=True)
queries = ["What is the capital of France?", "latest advancements in AI"]
documents = [
"Paris is the capital and most populous city of France.",
"Artificial intelligence is rapidly evolving with new breakthroughs.",
"The Eiffel Tower is a famous landmark in Paris."
]
query_embeddings = model.encode_queries(queries).numpy()
document_embeddings = model.encode_documents(documents).numpy()
print(max_sim(query_embeddings[0], document_embeddings[0]) > max_sim(query_embeddings[0], document_embeddings[1]))
# Output: True
- Architecture: BERT-based encoder with a learned pooling layer
- Embedding size: 128
- Document vectors per passage: 32
- Interaction: MaxSim between document and query embeddings
ConstBERT compresses token-level BERT embeddings into a fixed number (C) of document-level vectors using a learned linear projection. These vectors capture diverse semantic aspects of the document. Relevance is computed via a MaxSim operation between the query token embeddings and the fixed document vectors.
This design offers a trade-off between storage/computation efficiency and retrieval effectiveness, configurable by choosing the number of vectors C
.
Please cite the following paper if you use this code, or a modified version of it:
@article{constbert,
title={Efficient Constant-Space Multi-Vector Retrieval},
author={MacAvaney, Sean and Mallia, Antonio and Tonellotto, Nicola},
booktitle = {The 47th European Conference on Information Retrieval ({ECIR})},
year={2025}
}