Skip to content

refactor: optimize query_filter semantics#6086

Open
wojiaodoubao wants to merge 2 commits intolance-format:mainfrom
wojiaodoubao:refactor-query-filter
Open

refactor: optimize query_filter semantics#6086
wojiaodoubao wants to merge 2 commits intolance-format:mainfrom
wojiaodoubao:refactor-query-filter

Conversation

@wojiaodoubao
Copy link
Contributor

Close #6076

@github-actions github-actions bot added the python label Mar 3, 2026
@wojiaodoubao wojiaodoubao marked this pull request as draft March 3, 2026 13:51
@wojiaodoubao
Copy link
Contributor Author

Convert to draft. I'll rebase this after #6054.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 3, 2026

PR Review: refactor: optimize query_filter semantics

This PR simplifies the query filter architecture per #6076, making FTS/vector always a post-filter (never a pre-filter source), and adds threshold support for both FTS and vector post-filters. The net deletion of ~240 lines is a nice simplification.

P0: refine_columns stale guard for non-Match FTS

In the new refine_columns (scanner.rs), columns are always computed for query filters (the self.refine_query_filter guard was removed). However, the if-let FtsQuery::Match check still gates whether FTS columns are included. Since flat_fts now explicitly errors for non-Match FTS queries, this guard is harmless today, but if non-Match support is added later via a different code path (the issue mentions this as a TODO), the mismatch between refine_columns (silently no-ops) and refine_filter (errors) could cause confusing bugs. Consider adding a comment or unifying the error path.

P1: Reorder of expr_filter vs query_filter in refine_filter -- correctness concern

The old order was: query_filter first, then expr_filter. The new order is: expr_filter first, then query_filter. This is semantically meaningful:

  • Old: re-rank by FTS/KNN, then apply WHERE clause
  • New: apply WHERE clause first, then re-rank

This seems intentional for performance (less data to re-rank), but changes observable behavior: if the WHERE clause removes rows that would have been in the top-k of the re-ranker, the final result set differs from the old behavior. Since this is a breaking semantic change, please confirm this is intended and document it in the PR description.

P1: No test coverage for prefilter=true + query_filter combination

The old tests for prefilter + query_filter are removed (they tested the old prefilter-as-source semantics). However, the new code still allows prefilter=true with a query filter -- the query filter gets passed through to refine_filter. There are no new Rust tests validating that this path works correctly. Consider adding at least one test case to verify that prefilter=true + query_filter produces correct results under the new semantics.

Minor

The VectorSearchQuery.inner was changed from a method to a @Property, which fixes what appears to have been a latent bug -- callers like PySearchFilter.from_vector_search_query(filter.inner) would have received a bound method object rather than the dict.

@wojiaodoubao wojiaodoubao force-pushed the refactor-query-filter branch 2 times, most recently from dcb0a9d to 24ecceb Compare March 4, 2026 13:31
/// Calculates the FTS score for each row in the input
#[derive(Debug)]
pub struct FlatMatchFilterExec {
pub struct FlatMatchQueryExec {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This diff is a little misleading. What I do is removing FlatMatchFilterExec and adding PostFlatMatchQueryExec. FlatMatchQueryExec is not modified.

@wojiaodoubao wojiaodoubao force-pushed the refactor-query-filter branch from 24ecceb to c26af8d Compare March 4, 2026 13:32
@wojiaodoubao wojiaodoubao marked this pull request as ready for review March 4, 2026 13:34
@codecov
Copy link

codecov bot commented Mar 4, 2026

Codecov Report

❌ Patch coverage is 79.33884% with 50 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
rust/lance/src/io/exec/fts.rs 80.74% 17 Missing and 19 partials ⚠️
rust/lance/src/dataset/scanner.rs 74.54% 5 Missing and 9 partials ⚠️

📢 Thoughts on this report? Let us know!

@wojiaodoubao
Copy link
Contributor Author

Hi @westonpace , please help review when you have time, thanks very much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Simplify rules when vector/fts used as filter

1 participant