refactor: optimize query_filter semantics#6086
refactor: optimize query_filter semantics#6086wojiaodoubao wants to merge 2 commits intolance-format:mainfrom
Conversation
|
Convert to draft. I'll rebase this after #6054. |
PR Review: refactor: optimize query_filter semanticsThis PR simplifies the query filter architecture per #6076, making FTS/vector always a post-filter (never a pre-filter source), and adds threshold support for both FTS and vector post-filters. The net deletion of ~240 lines is a nice simplification. P0: refine_columns stale guard for non-Match FTSIn the new refine_columns (scanner.rs), columns are always computed for query filters (the self.refine_query_filter guard was removed). However, the if-let FtsQuery::Match check still gates whether FTS columns are included. Since flat_fts now explicitly errors for non-Match FTS queries, this guard is harmless today, but if non-Match support is added later via a different code path (the issue mentions this as a TODO), the mismatch between refine_columns (silently no-ops) and refine_filter (errors) could cause confusing bugs. Consider adding a comment or unifying the error path. P1: Reorder of expr_filter vs query_filter in refine_filter -- correctness concernThe old order was: query_filter first, then expr_filter. The new order is: expr_filter first, then query_filter. This is semantically meaningful:
This seems intentional for performance (less data to re-rank), but changes observable behavior: if the WHERE clause removes rows that would have been in the top-k of the re-ranker, the final result set differs from the old behavior. Since this is a breaking semantic change, please confirm this is intended and document it in the PR description. P1: No test coverage for prefilter=true + query_filter combinationThe old tests for prefilter + query_filter are removed (they tested the old prefilter-as-source semantics). However, the new code still allows prefilter=true with a query filter -- the query filter gets passed through to refine_filter. There are no new Rust tests validating that this path works correctly. Consider adding at least one test case to verify that prefilter=true + query_filter produces correct results under the new semantics. MinorThe VectorSearchQuery.inner was changed from a method to a @Property, which fixes what appears to have been a latent bug -- callers like PySearchFilter.from_vector_search_query(filter.inner) would have received a bound method object rather than the dict. |
dcb0a9d to
24ecceb
Compare
| /// Calculates the FTS score for each row in the input | ||
| #[derive(Debug)] | ||
| pub struct FlatMatchFilterExec { | ||
| pub struct FlatMatchQueryExec { |
There was a problem hiding this comment.
This diff is a little misleading. What I do is removing FlatMatchFilterExec and adding PostFlatMatchQueryExec. FlatMatchQueryExec is not modified.
24ecceb to
c26af8d
Compare
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
|
Hi @westonpace , please help review when you have time, thanks very much! |
Close #6076