Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
57 changes: 49 additions & 8 deletions docs/indexing/vector-index.mdx
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
---
title: "Vector Indexes"
sidebarTitle: "Vector Index"
description: "Build and optimize vector indexes in LanceDB using IVF-PQ, HNSW, and binary indexes."
description: "Build and optimize LanceDB vector indexes, including IVF_HNSW_SQ, IVF_RQ, IVF_PQ, and binary indexes."
icon: "arrow-up-right-dots"
---
import {
Expand Down Expand Up @@ -42,6 +42,40 @@ You can create a new index with different parameters using `create_index` - this
Although the `create_index` API returns immediately, the building of the vector index is asynchronous. To wait until all data is fully indexed, you can specify the `wait_timeout` parameter.
</Note>

## Choose the Right Index

Use this table as a quick starting point:

| If your top priority is... | Use this index | Why | Typical compressed size vs. raw vectors |
| :--- | :--- | :--- | :--- |
| Best recall/latency trade-off | `IVF_HNSW_SQ` | Combines IVF partitioning with HNSW graph search for strong quality at low latency. | Typically a little larger than `1/4` of raw size |
| Maximum compression | `IVF_RQ` | RaBitQ-style quantization with very strong compression. | Around `1/32` of raw size |
| Higher accuracy at small dimensions (`dimension <= 256`) | `IVF_PQ` | On small-dimensional vectors, `IVF_PQ` often provides higher accuracy with similar performance compared to `IVF_RQ`. | Usually `1/64` to `1/16` of raw size (depends on `num_sub_vectors`) |

<Warning>
If your vector search frequently includes metadata filters (`where(...)`), prefer `IVF_RQ` or `IVF_PQ`. In filtered workloads, `IVF_HNSW_SQ` latency can fluctuate significantly.
</Warning>

<Tip>
Compression ratios are practical rules of thumb and can vary with vector distribution, metric, and configuration.
For small dimensions, choose `IVF_PQ` for accuracy, not for guaranteed higher compression than `IVF_RQ`.
</Tip>

### Indexing Tuning by Index Type

Start with these values, then tune for your workload:

- `IVF_HNSW_SQ`
- `num_partitions`: start at `num_rows / 1,048,576` (rounded to an integer)
- Lower `num_partitions` can reduce search latency, but index build may become slower because partitions are larger.
- `ef_construction`: start at `150`; increase for better recall, decrease for faster indexing.
- `IVF_RQ`
- `num_partitions`: start at `num_rows / 4096` (rounded to an integer). This is a strong default for most datasets.
- `IVF_PQ`
- `num_partitions`: start at `num_rows / 4096` (rounded to an integer).
- `num_sub_vectors`: start at `dimension / 8`. Increase for better recall, decrease for faster search and smaller indexes.
- For small dimensions (`dimension <= 256`), `IVF_PQ` is often preferred over `IVF_RQ` for better accuracy at similar query performance.

## Example: Construct an IVF Index

In this example, we will create an index for a table containing 1536-dimensional vectors. The index will use IVF_PQ with L2 distance, which is well-suited for high-dimensional vector search.
Expand All @@ -53,12 +87,15 @@ Make sure you have enough data in your table (at least a few thousand rows) for
Sometimes you need to configure the index beyond default parameters:

- Index Types:
- `IVF_PQ`: Default index type, optimized for high-dimensional vectors
- `IVF_HNSW_SQ`: Combines IVF clustering with HNSW graph for improved search quality
- `IVF_HNSW_SQ`: best recall/latency trade-off
- `IVF_RQ`: best compression for large, high-dimensional datasets
- `IVF_PQ`: often higher accuracy than `IVF_RQ` for small dimensions (`<= 256`) at similar query performance
- `metrics`: default is `l2`, other available are `cosine` or `dot`
- When using `cosine` similarity, distances range from 0 (identical vectors) to 2 (maximally dissimilar)
- `num_partitions`: The number of partitions in the IVF portion of the index. This number is usually chosen to target a particular number of vectors per partition. A common heuristic is `num_rows / 8192`. Larger values generally make index building take longer but use less memory, and they often improve accuracy at the cost of slower search because queries typically need a higher `nprobes`. LanceDB automatically selects a sensible default `num_partitions` based on the heuristic mentioned above.
- `num_sub_vectors`: The number of sub-vectors that will be created during Product Quantization (PQ). This number is typically chosen based on the desired recall and the dimensionality of the vector. Larger `num_sub_vectors` increases accuracy but can significantly slow queries; a good starting point is `dimension / 8`.
- `num_partitions`: use index-specific starting points from the section above:
- `IVF_HNSW_SQ`: `num_rows / 1,048,576`
- `IVF_RQ` and `IVF_PQ`: `num_rows / 4096`
- `num_sub_vectors`: applies to `IVF_PQ`; start with `dimension / 8`. Larger values often improve recall but can slow search.

Let's take a look at a sample request for an IVF index:

Expand All @@ -81,7 +118,7 @@ Connect to LanceDB and open the table you want to index.

### 2. Construct an IVF Index

Create an `IVF_PQ` index with `cosine` similarity. Specify `vector_column_name` if you use multiple vector columns or non-default names. By default LanceDB uses Product Quantization; switch to `IVF_SQ` for scalar quantization.
Create an `IVF_PQ` index with `cosine` similarity. Specify `vector_column_name` if you use multiple vector columns or non-default names. You can switch `index_type` to `IVF_RQ` or `IVF_HNSW_SQ` depending on your recall/latency/compression target.

<CodeGroup>
<CodeBlock filename="Python" language="Python" icon="python">
Expand All @@ -104,7 +141,12 @@ Search using a random 1,536-dimensional embedding.
The previous query uses:

- `limit`: number of results to return
- `nprobes`: number of IVF partitions to scan; covering roughly 5–10% of partitions often balances recall and latency
- `nprobes`: number of IVF partitions to scan. LanceDB auto-tunes this by default.
- `ef`: primarily relevant for `IVF_HNSW_SQ`; start around `1.5 * k` (where `k=limit`) and increase up to `10 * k` for higher recall.
- `nprobes` by index type:
- `IVF_HNSW_SQ`: usually keep auto-tuned `nprobes`, then tune `ef` first. For filtered search (`where(...)`), expect higher latency variance.
- `IVF_RQ`: keep auto-tuned `nprobes`; increase only when recall is insufficient.
- `IVF_PQ`: keep auto-tuned `nprobes`; increase when recall is insufficient. Often preferred over `IVF_RQ` when `dimension <= 256`.
- `refine_factor`: reads additional candidates and reranks in memory
- `.to_pandas()`: converts the results to a pandas DataFrame

Expand Down Expand Up @@ -195,4 +237,3 @@ To wait until all data is fully indexed, you can specify the `wait_timeout` para
{VectorIndexCheckStatus}
</CodeBlock>
</CodeGroup>

9 changes: 5 additions & 4 deletions docs/search/vector-search.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -63,10 +63,11 @@ Use ANN search for large-scale applications where speed matters more than perfec
### Tuning `nprobes`

- `nprobes` controls how many partitions are searched at query time.
- Higher `nprobes` typically improves recall but reduces performance.
- A common starting point is to choose `nprobes` in the range 10-20, for balanced recall and latency.
- After a certain threshold, increasing `nprobes` yields only marginal accuracy gains.
- LanceDB automatically chooses a sensible `nprobes` by default to maximize performance without noticeably affecting accuracy.
- By default, LanceDB automatically tunes `nprobes` to achieve the best performance without noticeably sacrificing accuracy.
- In most cases, leave `nprobes` unset and use the auto-tuned value.
- Only tune `nprobes` manually when recall is below your target, or when you need even higher performance for your workload.
- If recall is too low, increase `nprobes` gradually, but after a certain threshold, increasing `nprobes` yields only marginal accuracy gains.
- If you need higher performance and have recall headroom, decrease `nprobes` gradually.

### Vector Search with Prefiltering

Expand Down
Loading