Skip to content

Commit

Permalink
Update vector-search-index.md
Browse files Browse the repository at this point in the history
  • Loading branch information
qiancai committed Oct 17, 2024
1 parent 7420d1a commit 247313e
Showing 1 changed file with 1 addition and 39 deletions.
40 changes: 1 addition & 39 deletions vector-search-index.md
Original file line number Diff line number Diff line change
Expand Up @@ -106,9 +106,7 @@ ORDER BY VEC_COSINE_DISTANCE(embedding, '[1, 2, 3]')
LIMIT 5;
```

To use the vector index with filters, consider the following workarounds:

**Post-filter after vector search:** Query for the K-Nearest neighbors first, then filter out unwanted results:
To use the vector index with filters, query for the K-Nearest neighbors first using vector search, and then filter out unwanted results:

```sql
-- For the following query, the `WHERE` filter is performed after KNN, so the vector index cannot be used:
Expand All @@ -124,42 +122,6 @@ WHERE category = "document";
-- Note that this query might return fewer than 5 results if some are filtered out.
```

**Use table partitioning**: Queries within a table [partition](/partitioned-table.md) can fully utilize the vector index. This can be useful if you want to perform equality filters, as equality filters can be turned into accessing specified partitions.

For example, suppose you want to find the closest documentation for a specific product version:

```sql
-- For the following query, the `WHERE` filter is performed before KNN, so the vector index cannot be used:
SELECT * FROM docs
WHERE ver = "v2.0"
ORDER BY VEC_COSINE_DISTANCE(embedding, '[1, 2, 3]')
LIMIT 5;
```

Instead of writing a query using the `WHERE` clause, you can partition the table and then query within the partition using the [`PARTITION` keyword](/partitioned-table.md#partition-selection):

```sql
CREATE TABLE docs (
id INT,
ver VARCHAR(10),
doc TEXT,
embedding VECTOR(3),
VECTOR INDEX idx_embedding USING HNSW ((VEC_COSINE_DISTANCE(embedding)))
) PARTITION BY LIST COLUMNS (ver) (
PARTITION p_v1_0 VALUES IN ('v1.0'),
PARTITION p_v1_1 VALUES IN ('v1.1'),
PARTITION p_v1_2 VALUES IN ('v1.2'),
PARTITION p_v2_0 VALUES IN ('v2.0')
);
SELECT * FROM docs
PARTITION (p_v2_0)
ORDER BY VEC_COSINE_DISTANCE(embedding, '[1, 2, 3]')
LIMIT 5;
```

For more information, see [Table Partitioning](/partitioned-table.md).

## View index build progress

After you insert a large volume of data, some of it might not be instantly persisted to TiFlash. For vector data that has already been persisted, the vector search index is built synchronously. For data that has not yet been persisted, the index will be built once the data is persisted. This process does not affect the accuracy and consistency of the data. You can still perform vector searches at any time and get complete results. However, performance will be suboptimal until vector indexes are fully built.

Check warning on line 127 in vector-search-index.md

View workflow job for this annotation

GitHub Actions / vale

[vale] reported by reviewdog 🐶 [PingCAP.Ambiguous] Consider using a clearer word than 'a large volume of' because it may cause confusion. Raw Output: {"message": "[PingCAP.Ambiguous] Consider using a clearer word than 'a large volume of' because it may cause confusion.", "location": {"path": "vector-search-index.md", "range": {"start": {"line": 127, "column": 18}}}, "severity": "INFO"}
Expand Down

0 comments on commit 247313e

Please sign in to comment.