Skip to content

ls3-lab/QueryGym

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

112 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Ask DeepWiki about this repo Publish to PyPI Build and Push Docker Images PyPI version PyPI - Downloads Python 3.9+ License: Apache 2.0

QueryGym Logo

A lightweight, reproducible toolkit for LLM-based query reformulation

📚 Documentation • 📊 Leaderboard • 📦 PyPI • 📄 Paper


Features

  • Single Prompt Bank (YAML) with metadata
  • Simple DataLoader: Dependency-free file loading for queries, qrels, and contexts
  • Format Loaders: Optional BEIR and MS MARCO format loaders in querygym.loaders
  • OpenAI-compatible LLM client (works with any OpenAI API–compatible endpoint)
  • Pyserini optional: either pass contexts (JSONL) or pass a retriever instance to build contexts
  • Export-only: emits reformulated queries; optionally generates a bash script for Pyserini + trec_eval

Installation

Option 1: Install from PyPI

pip install querygym

Option 2: Use Docker (Recommended for Quick Start)

# GPU version (default)
docker pull ghcr.io/ls3-lab/querygym:latest
docker run -it --gpus all ghcr.io/ls3-lab/querygym:latest

# CPU version (lightweight)
docker pull ghcr.io/ls3-lab/querygym:cpu
docker run -it ghcr.io/ls3-lab/querygym:cpu

# Or use Docker Compose
docker compose run --rm querygym

📖 Docker Setup: See DOCKER_SETUP.md for quick start or the full Docker guide for detailed usage.

Quickstart

Python API (Recommended)

import querygym as qg

# Load data
queries = qg.load_queries("queries.tsv")
qrels = qg.load_qrels("qrels.txt")
contexts = qg.load_contexts("contexts.jsonl")

# Create reformulator
reformulator = qg.create_reformulator("genqr_ensemble", model="gpt-4")

# Reformulate
results = reformulator.reformulate_batch(queries)

# Save
qg.DataLoader.save_queries(
    [qg.QueryItem(r.qid, r.reformulated) for r in results],
    "reformulated.tsv"
)

CLI

pip install -e .[hf,beir,dev]
export OPENAI_API_KEY=sk-...

# Run a method (e.g., genqr_ensemble)
querygym run --method genqr_ensemble \
  --queries-tsv queries.tsv \
  --output-tsv reformulated.tsv \
  --cfg-path querygym/config/defaults.yaml

Loading Datasets

BEIR:

import querygym as qg

# Download with BEIR library
from beir.datasets.data_loader import GenericDataLoader
data_path = GenericDataLoader("nfcorpus").download_and_unzip()

# Load with querygym
queries = qg.loaders.beir.load_queries(data_path)
qrels = qg.loaders.beir.load_qrels(data_path)

MS MARCO:

import querygym as qg

# Load from local files (download with ir_datasets)
queries = qg.loaders.msmarco.load_queries("queries.tsv")
qrels = qg.loaders.msmarco.load_qrels("qrels.tsv")

Examples

See the examples directory for:

Check examples/README.md for the full guide.

Contributing

We welcome contributions! Here's how you can help:

Adding a New Prompt

  1. Edit querygym/prompt_bank.yaml
  2. Add an entry with fields: id, method_family, version, introduced_by, license, authors, tags, template:{system,user}, notes

Adding a New Method

  1. Create a class under querygym/methods/*.py
  2. Subclass BaseReformulator, annotate VERSION, and register with @register_method("name")
  3. Pull templates via PromptBank.render(prompt_id, query=...)

Reporting Issues

  • Found a bug? Open an issue
  • Have a feature request? We'd love to hear it!

For detailed development guidelines, see the Contributing Guide in our documentation.

License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

Citation

If you use QueryGym in your research, please cite:

@misc{bigdeli2025querygymtoolkitreproduciblellmbased,
      title={QueryGym: A Toolkit for Reproducible LLM-Based Query Reformulation}, 
      author={Amin Bigdeli and Radin Hamidi Rad and Mert Incesu and Negar Arabzadeh and Charles L. A. Clarke and Ebrahim Bagheri},
      year={2025},
      eprint={2511.15996},
      archivePrefix={arXiv},
      primaryClass={cs.IR},
      url={https://arxiv.org/abs/2511.15996}, 
}