Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
59 changes: 40 additions & 19 deletions pages/querying/vector-search.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,7 @@ Below is a list of all configuration options:
- `metric: string (default=l2sq)` ➡ The metric used for the vector search. The default value is `l2sq` (squared Euclidean distance).
- `resize_coefficient: int (default=2)` ➡ When the index reaches its capacity, it resizes by multiplying the current capacity by this coefficient, if sufficient memory is available.
If resizing fails due to memory limitations, an exception will be thrown. Default value is `2`.
- `scalar_kind: string (default=f32)` ➡ The scalar type used to store each vector component. Smaller types reduce memory usage but may decrease precision.

## Run vector search

Expand All @@ -81,6 +82,7 @@ Additionally, the same information can be retrieved with the `SHOW VECTOR INDEX
- `capacity: int` ➡ The capacity of the vector index.
- `metric: string` ➡ Metric used for vector search similarity.
- `size: int` ➡ The number of entries in the vector index.
- `scalar_kind: string` ➡ The scalar type used for each vector element.

{<h3 className="custom-header"> Usage: </h3>}

Expand Down Expand Up @@ -139,9 +141,28 @@ for the metric is `l2sq` (squared Euclidean distance).

### Scalar type

Properties are stored as 64-bit values in the property store and as 32-bit values in the vector index.
Scalar type define the data type of each vector element. Default type for the
metric is `f32`.
Properties are stored as 64-bit values in the property store. However, for efficiency, vector elements in the vector index are stored using 32-bit values by default.
The `scalar_kind` setting determines the data type used for each vector element in the index. By default, the scalar type is set to `f32` (32-bit floating point),
which provides a good balance between precision and memory usage. Alternative options, such as `f16` for lower memory usage or `f64` for higher precision, allow you to fine-tune this tradeoff based on your specific needs.

| Scalar | Description |
|-----------|------------------------------------------------------------|
| `b1x8` | Binary format (1 bit per element, stored in 8-bit chunks). |
| `u40` | Unsigned 40-bit integer. |
| `uuid` | Universally unique identifier (UUID). |
| `bf16` | 16-bit floating point (bfloat16). |
| `f64` | 64-bit floating point (double). |
| `f32` | 32-bit floating point (float). |
| `f16` | 16-bit floating point. |
| `f8` | 8-bit floating point. |
| `u64` | 64-bit unsigned integer. |
| `u32` | 32-bit unsigned integer. |
| `u16` | 16-bit unsigned integer. |
| `u8` | 8-bit unsigned integer. |
| `i64` | 64-bit signed integer. |
| `i32` | 32-bit signed integer. |
| `i16` | 16-bit signed integer. |
| `i8` | 8-bit signed integer. |

## Drop vector index

Expand Down Expand Up @@ -179,7 +200,7 @@ After Memgraph MAGE and Lab have been started, head over to the Query execution
tab in Memgraph Lab and run the following query to create vector index:

```cypher
CREATE VECTOR INDEX index_name ON :Node(vector) WITH CONFIG {"dimension": 2, "capacity": 1000, "metric": "cos","resize_coefficient": 2};
CREATE VECTOR INDEX index_name ON :Node(vector) WITH CONFIG {"dimension": 2, "capacity": 1000, "metric": "cos", "resize_coefficient": 2, "scalar_kind": "f16"};
```

Then, run the following query to inspect vector index:
Expand All @@ -196,11 +217,11 @@ SHOW VECTOR INDEX INFO;
The above query will result with:

```
+--------------+--------------+--------------+--------------+--------------+--------------+
| capacity | dimension | index_name | label | property | size |
+--------------+--------------+--------------+--------------+--------------+--------------+
| 2048 | 2 | "index_name" | "Node" | "vector" | 0 |
+--------------+--------------+--------------+--------------+--------------+--------------+
+--------------+--------------+--------------+--------------+--------------+--------+--------------+
| capacity | dimension | index_name | label | property | size | scalar_kind |
+--------------+--------------+--------------+--------------+--------------+--------+--------------+
| 2048 | 2 | "index_name" | "Node" | "vector" | 0 | "f16" |
+--------------+--------------+--------------+--------------+--------------+--------+--------------+
```

{<h3 className="custom-header"> Create a node </h3>}
Expand All @@ -221,11 +242,11 @@ CALL vector_search.show_index_info() YIELD * RETURN *;
The above query results in:

```
+--------------+--------------+--------------+--------------+--------------+--------------+
| capacity | dimension | index_name | label | property | size |
+--------------+--------------+--------------+--------------+--------------+--------------+
| 2048 | 2 | "index_name" | "Node" | "vector" | 1 |
+--------------+--------------+--------------+--------------+--------------+--------------+
+--------------+--------------+--------------+--------------+--------------+--------+--------------+
| capacity | dimension | index_name | label | property | size | scalar_kind |
+--------------+--------------+--------------+--------------+--------------+--------+--------------+
| 2048 | 2 | "index_name" | "Node" | "vector" | 1 | "f16" |
+--------------+--------------+--------------+--------------+--------------+--------+--------------+
```

We can see the size of the index changed, due to one new node.
Expand Down Expand Up @@ -269,11 +290,11 @@ CALL vector_search.show_index_info() YIELD * RETURN *;

The size is now 5, due to 4 additional nodes:
```
+--------------+--------------+--------------+--------------+--------------+--------------+
| capacity | dimension | index_name | label | property | size |
+--------------+--------------+--------------+--------------+--------------+--------------+
| 2048 | 2 | "index_name" | "Node" | "vector" | 5 |
+--------------+--------------+--------------+--------------+--------------+--------------+
+--------------+--------------+--------------+--------------+--------------+--------+--------------+
| capacity | dimension | index_name | label | property | size | scalar_kind |
+--------------+--------------+--------------+--------------+--------------+--------+--------------+
| 2048 | 2 | "index_name" | "Node" | "vector" | 5 | "f16" |
+--------------+--------------+--------------+--------------+--------------+--------+--------------+
```

Let's again search for the top five similar nodes to the vector [2.0, 2.0] (to compare it to all nodes we have):
Expand Down