Skip to content

feat: upgrade query adapter algorithm #157

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from
Draft

feat: upgrade query adapter algorithm #157

wants to merge 2 commits into from

Conversation

lsorber
Copy link
Member

@lsorber lsorber commented Jun 6, 2025

Changes:

  1. Only insert missing evals in _bench.py.
  2. Only compute the query adapter if missing in _bench.py.
  3. Add a 'RAGLite with query adapter' to the benchmark command.
  4. Output the current document id as a tqdm postfix in insert_documents.
  5. Parallelise extraction of triplets in _query_adapter.py.
  6. Improve the query adapter by introducing per-query target weights and optimize these on a validation set of evals with L-BFGS.
  7. Add tests to verify that the gradient formula is correct.

@lsorber lsorber requested review from ThomasDelsart and Copilot June 6, 2025 19:17
@lsorber lsorber self-assigned this Jun 6, 2025
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR upgrades the query adapter algorithm by refactoring its implementation for multi-threaded triplet extraction, weight optimization via L-BFGS, and adds gradient-based unit tests. It also extends typing for float tensors, refines the CLI bench command with better error handling and reranker support, and enhances progress bar outputs in document insertion.

  • Refactored _query_adapter.py to introduce helper functions (_extract_triplets, _optimize_query_target, _compute_query_adapter_grad, etc.), multi-threading, and weight optimization with SciPy’s minimize.
  • Added FloatTensor alias in _typing.py and a test_query_adapter_grad in tests/test_query_adapter.py to validate the gradient.
  • Improved the CLI bench command (_cli.py, _bench.py), including a prescore step, optional reranker integration, and user-friendly import errors.
  • Minor enhancement: show inserted document IDs in the progress bar during _insert.py.

Reviewed Changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
tests/test_query_adapter.py Added gradient correctness test using scipy.check_grad and increased num_evals for stability
src/raglite/_typing.py Introduced FloatTensor alias for 3-D float arrays
src/raglite/_query_adapter.py Full refactor of query-adapter optimization logic, threading, and weight learning
src/raglite/_insert.py Show document ID in insertion progress bar
src/raglite/_cli.py Guard bench imports, expose reranker option
src/raglite/_bench.py Added prescore hook and reranker support in evaluator
Comments suppressed due to low confidence (1)

src/raglite/_query_adapter.py:80

  • [nitpick] The _compute_query_adapter helper now implements the core transform logic for both 'dot' and 'cosine' metrics but lacks direct unit tests. Consider adding tests that verify its output under known inputs.
def _compute_query_adapter(

@lsorber lsorber marked this pull request as draft June 16, 2025 07:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant