Skip to content

Commit 8564d39

Browse files
CSHARP-1014 Add vector support (#611)
Co-authored-by: Siyao (Jane) He <siyaoh4@uci.edu>
1 parent 2977525 commit 8564d39

27 files changed

+2102
-235
lines changed

doc/features/datatypes/README.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
When retrieving the value of a column from a `Row` object, you use a getter based on the type of the column.
44

5-
CQL3 data type|C# type
5+
CQL data type|C# type
66
---|---
77
ascii|string
88
bigint|long
@@ -29,3 +29,4 @@ tinyint|sbyte
2929
uuid|Guid
3030
varchar|string
3131
varint|BigInteger
32+
vector|[CqlVector](../vectors)

doc/features/vectors/README.md

Lines changed: 156 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,156 @@
1+
# Vector support
2+
3+
## Native CQL `vector` type
4+
5+
Introduced in Cassandra 5.0, DSE 6.9 and Datastax Astra, a `vector` is represented as a [CqlVector&lt;T&gt;][cqlvector-api].
6+
7+
The `vector` type is handled by the driver the same way as any other CQL type. You can use
8+
9+
## The `CqlVector<T>` C# type
10+
11+
The [API documentation](cqlvector-api) for this class contains useful information. Here's some examples:
12+
13+
### Creating vectors
14+
15+
```csharp
16+
// these 2 are equivalent
17+
var vector = new CqlVector<int>(1, 2, 3);
18+
var vector = CqlVector<int>.New(new int[] { 1, 2, 3 });
19+
20+
// CqlVector<int>.New requires an array but you prefer using other types such as List
21+
// you can call the IEnumerable extension method .ToArray() - note that it performs a copy
22+
var vector = CqlVector<int>.New(new List<int> { 1, 2, 3 }.ToArray());
23+
24+
// create a vector with the specified number of dimensions (this is similar to creating an array - new int[dimensions])
25+
var vector = CqlVector<int>.New(3);
26+
27+
// Converting an array to a CqlVector without copying
28+
var vector = new int[] { 1, 2, 3 }.AsCqlVector();
29+
30+
// Converting an IEnumerable to a CqlVector (calls .ToArray() internally so it performs a copy)
31+
var vector = new int[] { 1, 2, 3 }.ToCqlVector();
32+
```
33+
34+
### Modifying vectors
35+
36+
```csharp
37+
var vector = CqlVector<int>.New(3);
38+
39+
// you can use the index operator just as if you were dealing with an array or list
40+
vector[0] = 1;
41+
vector[1] = 2;
42+
vector[2] = 3;
43+
```
44+
45+
### Equality
46+
47+
`Equals()` is defined in the `CqlVector<T>` class but keep in mind that it uses `Array.SequenceEqual` internally which doesn't account for nested arrays/collections so `Equals()` will not work correctly for those cases.
48+
49+
```csharp
50+
var vector1 = new CqlVector<int>(1, 2, 3);
51+
var vector2 = new CqlVector<int>(1, 2, 3);
52+
vector1.Equals(vector2); // this returns true
53+
```
54+
55+
## Writing vector data and performing vector search operations
56+
57+
The `vector` type is handled by the driver the same way as any other CQL type.
58+
59+
The following examples use this schema. In this case, `j` is a 3 dimensional `vector` column of `float` values. Both the vector subtype and the number of dimensions can be changed. Any CQL type is valid as a vector subtype.
60+
61+
```sql
62+
CREATE TABLE IF NOT EXISTS table1 (
63+
i int PRIMARY KEY,
64+
j vector<float, 3>
65+
);
66+
67+
/* Supported by C* 5.0, for vector search with the ANN operator */
68+
CREATE CUSTOM INDEX IF NOT EXISTS ann_table1_index ON table1(j) USING 'StorageAttachedIndex';
69+
```
70+
71+
### Simple Statements
72+
73+
```csharp
74+
await session.ExecuteAsync(
75+
new SimpleStatement(
76+
"INSERT INTO table1 (i, j) VALUES (?, ?)",
77+
1,
78+
new CqlVector<float>(1.0f, 2.0f, 3.0f)));
79+
var rowSet = await session.ExecuteAsync(
80+
new SimpleStatement(
81+
"SELECT * FROM table1 ORDER BY j ANN OF ? LIMIT ?",
82+
new CqlVector<float>(0.6f, 0.5f, 0.9f),
83+
1));
84+
var row = rowSet.Single();
85+
var i = row.GetValue<int>("i");
86+
var j = row.GetValue<CqlVector<float>?>("j");
87+
```
88+
89+
### Prepared Statements
90+
91+
```csharp
92+
var psInsert = await session.PrepareAsync("INSERT INTO table1 (i, j) VALUES (?, ?)");
93+
var psSelect = await session.PrepareAsync("SELECT * FROM table1 ORDER BY j ANN OF ? LIMIT ?");
94+
95+
var boundInsert = psInsert.Bind(2, new CqlVector<float>(5.0f, 6.0f, 7.0f));
96+
await session.ExecuteAsync(boundInsert);
97+
98+
var boundSelect = psSelect.Bind(new CqlVector<float>(4.7f, 5.0f, 5.0f), 1);
99+
var rowSet = await session.ExecuteAsync(boundSelect);
100+
101+
var row = rowSet.Single();
102+
var i = row.GetValue<int>("i");
103+
var j = row.GetValue<CqlVector<float>>("j");
104+
```
105+
106+
### LINQ and Mapper
107+
108+
The LINQ component of the driver doesn't support the `ANN` operator so it's probably best to avoid using LINQ when working with vectors. If a particular workload doesn't require the `ANN` operator then LINQ can be used without issues.
109+
110+
```csharp
111+
// you can also provide a MappingConfiguration object to the Table/Mapper constructors
112+
// (or use MappingConfiguration.Global) programatically instead of these attributes
113+
[Cassandra.Mapping.Attributes.Table("table1")]
114+
public class Table1
115+
{
116+
[Cassandra.Mapping.Attributes.PartitionKey]
117+
[Cassandra.Mapping.Attributes.Column("i")]
118+
public int I { get; set; }
119+
120+
[Cassandra.Mapping.Attributes.Column("j")]
121+
public CqlVector<float>? J { get; set; }
122+
}
123+
124+
// LINQ
125+
126+
var table = new Table<TestTable1>(session);
127+
await table
128+
.Insert(new TestTable1 { I = 3, J = new CqlVector<float>(10.1f, 10.2f, 10.3f) })
129+
.ExecuteAsync();
130+
131+
// Using AllowFiltering is not recommended due to unpredictable performance.
132+
// Here we use AllowFiltering because the example schema is meant to showcase vector search
133+
// but the ANN operator is not supported in LINQ yet.
134+
var entity = (await table.Where(t => t.I == 3 && t.J == CqlVector<float>.New(new [] {10.1f, 10.2f, 10.3f})).AllowFiltering().ExecuteAsync()).SingleOrDefault();
135+
136+
// Alternative select using Query syntax instead of Method syntax
137+
var entity = (await (
138+
from t in table
139+
where t.J == CqlVector<float>.New(new [] {10.1f, 10.2f, 10.3f})
140+
select t
141+
).AllowFiltering().ExecuteAsync()).SingleOrDefault();
142+
143+
// Mapper
144+
145+
var mapper = new Mapper(session);
146+
await mapper.InsertAsync(
147+
new TestTable1 { I = 4, J = new CqlVector<float>(11.1f, 11.2f, 11.3f) });
148+
var vectorSearchData = await mapper.FetchAsync<TestTable1>(
149+
"ORDER BY j ANN OF ? LIMIT ?",
150+
new CqlVector<float>(10.9f, 10.9f, 10.9f),
151+
1);
152+
var entity = vectorSearchData.SingleOrDefault();
153+
```
154+
155+
156+
[cqlvector-api]: https://docs.datastax.com/en/drivers/csharp/latest/api/Cassandra.CqlVector-1.html

0 commit comments

Comments
 (0)