Skip to content

Simplify rules when vector/fts used as filter #6076

@westonpace

Description

@westonpace

The filter_query method in Scanner allows for an FTS or vector search to be used as a filter. Currently there are branches for both prefilter and postfilter and in the FTS filter path there are branches for "match query" vs. "not match query". I find these confusing and inconsistent.

  • If we perform a full text search with a vector pre-filter, and it is a match query, then we rerank the vector search results, removing rows where the score is 0.
  • If we perform a full text search with a vector pre-filter, and it is not a match query, then we perform both an FTS search and a vector search and do an inner join on the results on _rowid.
  • If we perform a vector search with an FTS pre-filter, then we do a flat KNN on the FTS results
  • If we perform a full text search with a vector post-filter, then we rerank the results by KNN distance (and don't actually filter anything)
  • If we perform a vector search with an FTS post-filter, then we remove all results that do not share a token with the query

It is confusing that a full text search with a vector prefilter becomes a vector search (and vice versa). It also seems like a vector search with an FTS prefilter is the same thing as an FTS search with a vector postfilter (and vice versa).

I propose we simplify as follows:

  • Allow FTS and vector to be used as post-filters for any kind of query (even scans that are not a search)
  • Define an FTS post-filter as a filter that reranks the output and (optionally) removes all rows where the BM25 score is below some threshold
  • Define a vector search post-filter as a filter that reranks the output and (optionally) removes all rows where the vector search distance is above some threshold
  • Do not allow FTS or vector to be used as prefilters
  • Do not have different behavior for match query and non-match queries. If we can't support non-match queries in filter mode then just return a "not supported" error until we can.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions