-
Notifications
You must be signed in to change notification settings - Fork 569
Open
Description
The filter_query method in Scanner allows for an FTS or vector search to be used as a filter. Currently there are branches for both prefilter and postfilter and in the FTS filter path there are branches for "match query" vs. "not match query". I find these confusing and inconsistent.
- If we perform a full text search with a vector pre-filter, and it is a match query, then we rerank the vector search results, removing rows where the score is 0.
- If we perform a full text search with a vector pre-filter, and it is not a match query, then we perform both an FTS search and a vector search and do an inner join on the results on _rowid.
- If we perform a vector search with an FTS pre-filter, then we do a flat KNN on the FTS results
- If we perform a full text search with a vector post-filter, then we rerank the results by KNN distance (and don't actually filter anything)
- If we perform a vector search with an FTS post-filter, then we remove all results that do not share a token with the query
It is confusing that a full text search with a vector prefilter becomes a vector search (and vice versa). It also seems like a vector search with an FTS prefilter is the same thing as an FTS search with a vector postfilter (and vice versa).
I propose we simplify as follows:
- Allow FTS and vector to be used as post-filters for any kind of query (even scans that are not a search)
- Define an FTS post-filter as a filter that reranks the output and (optionally) removes all rows where the BM25 score is below some threshold
- Define a vector search post-filter as a filter that reranks the output and (optionally) removes all rows where the vector search distance is above some threshold
- Do not allow FTS or vector to be used as prefilters
- Do not have different behavior for match query and non-match queries. If we can't support non-match queries in filter mode then just return a "not supported" error until we can.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels