Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.Net: Added implementation of SQLite connector for new memory design #9164

Merged
merged 128 commits into from
Oct 17, 2024
Merged
Changes from 1 commit
Commits
Show all changes
128 commits
Select commit Hold shift + click to select a range
d7287bb
.Net: ADR for Text Search Abstractions (#8307)
markwallace-microsoft Aug 22, 2024
590745f
.Net: Add TextSearchExtension methods to create KernelPlugins and Ker…
markwallace-microsoft Aug 27, 2024
3c09d91
.Net: Update Bing Search to new Text Search Design (#8343)
markwallace-microsoft Aug 29, 2024
187afca
Merge branch 'main' into feature-vector-search
markwallace-microsoft Aug 29, 2024
16f4fa1
.Net: Update Google Search to new Text Search Design (#8394)
markwallace-microsoft Aug 29, 2024
5af6540
.Net: Add text search concepts demonstrating RAG and Function Calling…
markwallace-microsoft Aug 30, 2024
6d13254
Merge branch 'main' into feature-vector-search
markwallace-microsoft Aug 30, 2024
f806fc2
.Net: Simplify how additional text search filter parameters are speci…
markwallace-microsoft Sep 2, 2024
890c0f8
Merge branch 'main' into feature-vector-search
markwallace-microsoft Sep 2, 2024
cfbac04
Merge branch 'main' into feature-vector-search
markwallace-microsoft Sep 3, 2024
b96d607
Merge branch 'main' into feature-vector-search
markwallace-microsoft Sep 3, 2024
4b8406e
.Net: Add Vector Search ADR document and abstractions (#8494)
westey-m Sep 4, 2024
d02005a
.Net: Add support for advanced search keywords (#8489)
markwallace-microsoft Sep 4, 2024
ac540db
.Net: Use interfaces for mapper abstractions (#8424)
markwallace-microsoft Sep 4, 2024
d745d57
.Net: Add vector search implementation for azure ai search (#8507)
westey-m Sep 5, 2024
7836721
.Net: Add qdrant vector search implementation. (#8508)
westey-m Sep 5, 2024
ec10424
.Net: Add Redis vector search implementation (#8510)
westey-m Sep 5, 2024
d43c5ff
.Net: Unify text and vector search and move all to data namespace. (#…
westey-m Sep 6, 2024
70a98ee
.Net: Add volatile vector search and enforcing collection type. (#8546)
westey-m Sep 6, 2024
ffaa93f
Update adr document with filter changes. (#8590)
westey-m Sep 9, 2024
639694b
Resolve merge conflicts
markwallace-microsoft Sep 11, 2024
c1cd203
Fix formatting
markwallace-microsoft Sep 11, 2024
7bdf519
Merge branch 'main' into feature-vector-search
westey-m Sep 12, 2024
6b92a18
Merge branch 'main' into feature-vector-search
markwallace-microsoft Sep 13, 2024
1fc7e91
.Net: Switch to using interfaces for search instead of query objects.…
westey-m Sep 13, 2024
9c8e438
Merge branch 'main' into feature-vector-search
markwallace-microsoft Sep 13, 2024
031a653
Merge branch 'main' into feature-vector-search
westey-m Sep 17, 2024
971afd6
.Net: Suppress moq warning after taking package updates. (#8838)
westey-m Sep 17, 2024
bf34a51
.Net: Basic implementation of ITextSearch for an IVectorStore impleme…
markwallace-microsoft Sep 17, 2024
434498a
.Net: Add float64 support for redis vector search. (#8847)
westey-m Sep 17, 2024
092a9ac
Merge branch 'main' into feature-vector-search
dmytrostruk Sep 18, 2024
f35d051
.Net: Enhance volatile memory connector to allow collection to be ser…
markwallace-microsoft Sep 18, 2024
8dc7dc3
.Net: Added vector search implementation for Azure CosmosDB for Mongo…
dmytrostruk Sep 18, 2024
fa461ce
.Net: Vector Search Bug fixes (#8890)
westey-m Sep 18, 2024
be531dd
.Net: [Feature branch] Added support for Filter and Offset parameters…
dmytrostruk Sep 19, 2024
36aa3f2
.Net: Vector search sample: Multi Vector and Paging (#8920)
westey-m Sep 20, 2024
8fa1e8e
.Net: Adding a multi-store vector search sample. (#8909)
westey-m Sep 20, 2024
b5c649f
.Net: [Feature Branch] Added vector search implementation for Azure C…
dmytrostruk Sep 20, 2024
952d679
Merge branch 'main' into feature-vector-search
markwallace-microsoft Sep 20, 2024
6c4aa5d
.Net: Add VectorStoreTextSearch concepts and unit tests (#8891)
markwallace-microsoft Sep 23, 2024
cb00332
Merge branch 'main' into feature-vector-search
westey-m Sep 23, 2024
21f8a27
.Net: [Feature Branch] Added vector search implementation for Weaviat…
dmytrostruk Sep 23, 2024
d6b6019
.Net: Add filtering support for VectoreStoreTextSearch (#8947)
markwallace-microsoft Sep 24, 2024
1c35c01
Merge branch 'main' into feature-vector-search
markwallace-microsoft Sep 24, 2024
6d041b3
.Net: Adding generic data type support to the volatile vector store. …
westey-m Sep 24, 2024
16f787e
.Net: Rename a number of properties as agreed and remove dangerous de…
westey-m Sep 25, 2024
12a85d5
Merge branch 'main' into feature-vector-search
dmytrostruk Sep 26, 2024
66d608c
.Net: Add delegates for the string and result mappers (#9000)
markwallace-microsoft Sep 26, 2024
d08eef3
.Net: Integration tests for Bing and Google text search (#8991)
markwallace-microsoft Sep 27, 2024
21e37a0
.Net: Add DI registration helpers for collections and search (#9007)
westey-m Sep 27, 2024
a6dc6b3
Merge branch 'main' into feature-vector-search
dmytrostruk Sep 30, 2024
2758ff7
Fix after merge
dmytrostruk Sep 30, 2024
da7291b
Implemented database manipulation methods
dmytrostruk Oct 1, 2024
667533e
Merge branch 'main' into feature-vector-search
markwallace-microsoft Oct 1, 2024
7975fed
Merge branch 'main' into feature-vector-search
markwallace-microsoft Oct 1, 2024
6931b79
Merge branch 'main' into feature-vector-search
markwallace-microsoft Oct 1, 2024
5bce499
.Net: Fix issues after vector split merge. (#9050)
westey-m Oct 1, 2024
1046bd3
Merge branch 'feature-vector-search' into sqlite-connector
dmytrostruk Oct 1, 2024
467191e
.Net: Support DI for Text Search Services (#9026)
markwallace-microsoft Oct 1, 2024
a608659
Fixed compilation errors after merge
dmytrostruk Oct 1, 2024
136fc0f
Added creation of virtual table for vectors
dmytrostruk Oct 1, 2024
d6ec1ee
Added deletion of virtual table for vectors
dmytrostruk Oct 1, 2024
2b94b08
.Net: Adapt In-Memory Connector to new Text Search Design (#9046)
markwallace-microsoft Oct 2, 2024
7710e36
Merge branch 'main' into feature-vector-search
markwallace-microsoft Oct 2, 2024
e2e22b6
.Net: Vector search recordreader manual merge (#9059)
westey-m Oct 2, 2024
6b05639
Implemented default mapper
dmytrostruk Oct 2, 2024
c4f3c66
Merge branch 'feature-vector-search' into sqlite-connector
dmytrostruk Oct 2, 2024
3b02888
Implemented Get and Upsert operations
dmytrostruk Oct 3, 2024
d8eb4b5
.Net: Adapt Azure AI Search Connector to new Text Search Design (#9061)
markwallace-microsoft Oct 3, 2024
d98e218
Implemented Delete record operations. Fixed batch operations.
dmytrostruk Oct 3, 2024
098ba75
Exposed virtual table name configuration
dmytrostruk Oct 3, 2024
e668013
Added test for string keys
dmytrostruk Oct 3, 2024
0179733
Added distance function mapping
dmytrostruk Oct 3, 2024
cc8fe45
.Net: Add response container type for vector search results. (#9082)
westey-m Oct 4, 2024
fba0aba
.Net: Use AsynEnumerable search interface for azure ai search. (#9090)
westey-m Oct 4, 2024
c198b09
Vector search and code refactoring
dmytrostruk Oct 4, 2024
d09e70f
Added offset logic
dmytrostruk Oct 5, 2024
03b89fe
Added filter usage
dmytrostruk Oct 5, 2024
69b9fc4
.Net: Update Qdrant Memory Connector to new Text Search Design (#9076)
markwallace-microsoft Oct 7, 2024
6d32d34
.Net: Add attributes to add to a model which can be converted to a Te…
markwallace-microsoft Oct 7, 2024
805febe
.Net: Change signature of the TextSearchResult constructor (#9153)
markwallace-microsoft Oct 8, 2024
add4461
Merge branch 'feature-vector-search' into sqlite-connector
dmytrostruk Oct 8, 2024
ea335bc
Merge from feature branch
dmytrostruk Oct 8, 2024
22e24a5
Merge branch 'main' into feature-vector-search
dmytrostruk Oct 8, 2024
db95435
Merge branch 'feature-vector-search' into sqlite-connector
dmytrostruk Oct 8, 2024
2d88cfb
Added support for generic data model
dmytrostruk Oct 8, 2024
bb2b457
Added vector store class
dmytrostruk Oct 9, 2024
806829d
Added extension methods
dmytrostruk Oct 9, 2024
4625afb
Small fix
dmytrostruk Oct 9, 2024
46ec215
Small fix
dmytrostruk Oct 9, 2024
b85b0e7
Fixed warnings
dmytrostruk Oct 9, 2024
a8af704
Fixed warning
dmytrostruk Oct 9, 2024
f29e110
Merge branch 'main' into sqlite-connector
dmytrostruk Oct 9, 2024
0dec618
Fix after merge
dmytrostruk Oct 9, 2024
59e9bba
Fixed usings
dmytrostruk Oct 9, 2024
21cf1ac
Added unit tests
dmytrostruk Oct 9, 2024
4687d06
More unit tests
dmytrostruk Oct 9, 2024
52c8a4c
Merge branch 'main' into sqlite-connector
dmytrostruk Oct 10, 2024
7da8523
Merge branch 'main' into sqlite-connector
dmytrostruk Oct 11, 2024
22c9491
Fixes after merge
dmytrostruk Oct 11, 2024
270e9e1
Moved Sqlite unit test project
dmytrostruk Oct 11, 2024
f7cbaf2
Addressed PR feedback
dmytrostruk Oct 11, 2024
9094eaa
Added more comments
dmytrostruk Oct 11, 2024
aa4b4b6
More improvements and fixes
dmytrostruk Oct 11, 2024
79dac79
Merge branch 'main' into sqlite-connector
dmytrostruk Oct 11, 2024
77fe3c2
Updated mapping and fixed tests
dmytrostruk Oct 14, 2024
4ecf78c
Fixed usings
dmytrostruk Oct 14, 2024
a34fc23
Added unit tests for command builder
dmytrostruk Oct 14, 2024
ecf6de9
Added unit tests for record collection class
dmytrostruk Oct 14, 2024
43900fd
Added integration test to get existing record
dmytrostruk Oct 14, 2024
7453e21
Fixed warnings
dmytrostruk Oct 14, 2024
3a797fb
Updated extension methods
dmytrostruk Oct 14, 2024
1b75ccd
Merge branch 'main' into sqlite-connector
dmytrostruk Oct 15, 2024
44e366f
Fixes based on merge from main
dmytrostruk Oct 15, 2024
cbd2902
Added more tests
dmytrostruk Oct 16, 2024
5048f9f
Small fix
dmytrostruk Oct 16, 2024
4458478
Small refactoring
dmytrostruk Oct 16, 2024
c4df0c6
Addressed PR feedback
dmytrostruk Oct 16, 2024
c55e350
Added more tests for collection class
dmytrostruk Oct 16, 2024
d8e72b3
Added unit tests for default record mapper
dmytrostruk Oct 16, 2024
53e3c5a
Added more unit tests
dmytrostruk Oct 16, 2024
9884082
Merge branch 'main' into sqlite-connector
dmytrostruk Oct 16, 2024
e10ccf7
Merge branch 'main' into sqlite-connector
dmytrostruk Oct 17, 2024
a335a61
Addressed PR feedback
dmytrostruk Oct 17, 2024
6510a66
Updated SQLite connector package version suffix
dmytrostruk Oct 17, 2024
c79b8c6
Merge branch 'main' into sqlite-connector
dmytrostruk Oct 17, 2024
5b5399e
Removed kernel builder extensions
dmytrostruk Oct 17, 2024
033ca59
Merge branch 'main' into sqlite-connector
dmytrostruk Oct 17, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
.Net: Added vector search implementation for Azure CosmosDB for Mongo…
…DB (#8887)

### Motivation and Context

<!-- Thank you for your contribution to the semantic-kernel repo!
Please help reviewers and future users, providing the following
information:
  1. Why is this change required?
  2. What problem does it solve?
  3. What scenario does it contribute to?
  4. If it fixes an open issue, please link to the issue here.
-->

Related: #6522

- Implemented `VectorizedSearchAsync` method in Azure CosmosDB for
MongoDB connector.
- Added unit and integration tests.

### Contribution Checklist

<!-- Before submitting this PR, please make sure: -->

- [x] The code builds clean without any errors or warnings
- [x] The PR follows the [SK Contribution
Guidelines](https://github.com/microsoft/semantic-kernel/blob/main/CONTRIBUTING.md)
and the [pre-submission formatting
script](https://github.com/microsoft/semantic-kernel/blob/main/CONTRIBUTING.md#development-scripts)
raises no violations
- [x] All unit tests pass, and I have added new tests where possible
- [x] I didn't break anyone 😄
  • Loading branch information
dmytrostruk authored Sep 18, 2024
commit 8dc7dc34720e675335a397d685f1eed96954959a
Original file line number Diff line number Diff line change
@@ -546,6 +546,132 @@ public async Task GetWithCustomMapperWorksCorrectlyAsync()
Assert.Equal("Name from mapper", result.HotelName);
}

[Theory]
[MemberData(nameof(VectorizedSearchVectorTypeData))]
public async Task VectorizedSearchThrowsExceptionWithInvalidVectorTypeAsync(object vector, bool exceptionExpected)
{
// Arrange
this.MockCollectionForSearch();

var sut = new AzureCosmosDBMongoDBVectorStoreRecordCollection<AzureCosmosDBMongoDBHotelModel>(
this._mockMongoDatabase.Object,
"collection");

// Act & Assert
if (exceptionExpected)
{
await Assert.ThrowsAsync<NotSupportedException>(async () => await sut.VectorizedSearchAsync(vector).ToListAsync());
}
else
{
var result = await sut.VectorizedSearchAsync(vector).FirstOrDefaultAsync();

Assert.NotNull(result);
}
}

[Theory]
[InlineData(null, "TestEmbedding1", 1, 1)]
[InlineData("", "TestEmbedding1", 2, 2)]
[InlineData("TestEmbedding1", "TestEmbedding1", 3, 3)]
[InlineData("TestEmbedding2", "test_embedding_2", 4, 4)]
public async Task VectorizedSearchUsesValidQueryAsync(
string? vectorPropertyName,
string expectedVectorPropertyName,
int actualLimit,
int expectedLimit)
{
// Arrange
var vector = new ReadOnlyMemory<float>([1f, 2f, 3f]);

var expectedSearch = new BsonDocument
{
{ "$search",
new BsonDocument
{
{ "cosmosSearch",
new BsonDocument
{
{ "vector", BsonArray.Create(vector.ToArray()) },
{ "path", expectedVectorPropertyName },
{ "k", expectedLimit },
}
},
{ "returnStoredSource", true }
}
}
};

var expectedProjection = new BsonDocument
{
{ "$project",
new BsonDocument
{
{ "similarityScore", new BsonDocument { { "$meta", "searchScore" } } },
{ "document", "$$ROOT" }
}
}
};

this.MockCollectionForSearch();

var sut = new AzureCosmosDBMongoDBVectorStoreRecordCollection<VectorSearchModel>(
this._mockMongoDatabase.Object,
"collection");

// Act
var result = await sut.VectorizedSearchAsync(vector, new()
{
VectorFieldName = vectorPropertyName,
Limit = actualLimit,
}).FirstOrDefaultAsync();

// Assert
Assert.NotNull(result);

this._mockMongoCollection.Verify(l => l.AggregateAsync(
It.Is<PipelineDefinition<BsonDocument, BsonDocument>>(pipeline =>
this.ComparePipeline(pipeline, expectedSearch, expectedProjection)),
It.IsAny<AggregateOptions>(),
It.IsAny<CancellationToken>()), Times.Once());
}

[Fact]
public async Task VectorizedSearchThrowsExceptionWithNonExistentVectorPropertyNameAsync()
{
// Arrange
this.MockCollectionForSearch();

var sut = new AzureCosmosDBMongoDBVectorStoreRecordCollection<AzureCosmosDBMongoDBHotelModel>(
this._mockMongoDatabase.Object,
"collection");

var options = new VectorSearchOptions { VectorFieldName = "non-existent-property" };

// Act & Assert
await Assert.ThrowsAsync<InvalidOperationException>(async () => await sut.VectorizedSearchAsync(new ReadOnlyMemory<float>([1f, 2f, 3f]), options).FirstOrDefaultAsync());
}

[Fact]
public async Task VectorizedSearchReturnsRecordWithScoreAsync()
{
// Arrange
this.MockCollectionForSearch();

var sut = new AzureCosmosDBMongoDBVectorStoreRecordCollection<AzureCosmosDBMongoDBHotelModel>(
this._mockMongoDatabase.Object,
"collection");

// Act
var result = await sut.VectorizedSearchAsync(new ReadOnlyMemory<float>([1f, 2f, 3f])).FirstOrDefaultAsync();

// Assert
Assert.NotNull(result);
Assert.Equal("key", result.Record.HotelId);
Assert.Equal("Test Name", result.Record.HotelName);
Assert.Equal(0.99f, result.Score);
}

public static TheoryData<List<string>, string, bool> CollectionExistsData => new()
{
{ ["collection-2"], "collection-2", true },
@@ -558,8 +684,54 @@ public async Task GetWithCustomMapperWorksCorrectlyAsync()
{ [], 1 }
};

public static TheoryData<object, bool> VectorizedSearchVectorTypeData => new()
{
{ new ReadOnlyMemory<float>([1f, 2f, 3f]), false },
{ new ReadOnlyMemory<double>([1f, 2f, 3f]), false },
{ new ReadOnlyMemory<float>?(new([1f, 2f, 3f])), false },
{ new ReadOnlyMemory<double>?(new([1f, 2f, 3f])), false },
{ new List<float>([1f, 2f, 3f]), true },
};

#region private

private bool ComparePipeline(
PipelineDefinition<BsonDocument, BsonDocument> actualPipeline,
BsonDocument expectedSearch,
BsonDocument expectedProjection)
{
var serializerRegistry = BsonSerializer.SerializerRegistry;
var documentSerializer = serializerRegistry.GetSerializer<BsonDocument>();

var documents = actualPipeline.Render(documentSerializer, serializerRegistry).Documents;

return
documents[0].ToJson() == expectedSearch.ToJson() &&
documents[1].ToJson() == expectedProjection.ToJson();
}

private void MockCollectionForSearch()
{
var document = new BsonDocument { ["_id"] = "key", ["HotelName"] = "Test Name" };
var searchResult = new BsonDocument { ["document"] = document, ["similarityScore"] = 0.99f };

var mockCursor = new Mock<IAsyncCursor<BsonDocument>>();
mockCursor
.Setup(l => l.MoveNextAsync(It.IsAny<CancellationToken>()))
.ReturnsAsync(true);

mockCursor
.Setup(l => l.Current)
.Returns([searchResult]);

this._mockMongoCollection
.Setup(l => l.AggregateAsync(
It.IsAny<PipelineDefinition<BsonDocument, BsonDocument>>(),
It.IsAny<AggregateOptions>(),
It.IsAny<CancellationToken>()))
.ReturnsAsync(mockCursor.Object);
}

private async Task TestUpsertWithModelAsync<TDataModel>(
TDataModel dataModel,
string expectedPropertyName,
@@ -645,6 +817,23 @@ private sealed class BsonVectorStoreWithNameTestModel
[VectorStoreRecordData(StoragePropertyName = "storage_hotel_name")]
public string? HotelName { get; set; }
}

private sealed class VectorSearchModel
{
[BsonId]
[VectorStoreRecordKey]
public string? Id { get; set; }

[VectorStoreRecordData]
public string? HotelName { get; set; }

[VectorStoreRecordVector(Dimensions: 4, IndexKind: IndexKind.IvfFlat, DistanceFunction: DistanceFunction.CosineDistance, StoragePropertyName = "test_embedding_1")]
public ReadOnlyMemory<float> TestEmbedding1 { get; set; }

[BsonElement("test_embedding_2")]
[VectorStoreRecordVector(Dimensions: 4, IndexKind: IndexKind.IvfFlat, DistanceFunction: DistanceFunction.CosineDistance)]
public ReadOnlyMemory<float> TestEmbedding2 { get; set; }
}
#pragma warning restore CA1812

#endregion
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
// Copyright (c) Microsoft. All rights reserved.

using Microsoft.SemanticKernel.Data;
using MongoDB.Bson;

namespace Microsoft.SemanticKernel.Connectors.AzureCosmosDBMongoDB;

/// <summary>
/// Contains mapping helpers to use when searching for documents using Azure CosmosDB MongoDB.
/// </summary>
internal sealed class AzureCosmosDBMongoDBVectorStoreCollectionSearchMapping
{
/// <summary>Returns search part of the search query for <see cref="IndexKind.Hnsw"/> index kind.</summary>
public static BsonDocument GetSearchQueryForHnswIndex<TVector>(
TVector vector,
string vectorPropertyName,
int limit,
int efSearch)
{
return new BsonDocument
{
{ "$search",
new BsonDocument
{
{ "cosmosSearch",
new BsonDocument
{
{ "vector", BsonArray.Create(vector) },
{ "path", vectorPropertyName },
{ "k", limit },
{ "efSearch", efSearch }
}
}
}
}
};
}

/// <summary>Returns search part of the search query for <see cref="IndexKind.IvfFlat"/> index kind.</summary>
public static BsonDocument GetSearchQueryForIvfIndex<TVector>(
TVector vector,
string vectorPropertyName,
int limit)
{
return new BsonDocument
{
{ "$search",
new BsonDocument
{
{ "cosmosSearch",
new BsonDocument
{
{ "vector", BsonArray.Create(vector) },
{ "path", vectorPropertyName },
{ "k", limit },
}
},
{ "returnStoredSource", true }
}
}
};
}

/// <summary>Returns projection part of the search query to return similarity score together with document.</summary>
public static BsonDocument GetProjectionQuery(string scorePropertyName, string documentPropertyName)
{
return new BsonDocument
{
{ "$project",
new BsonDocument
{
{ scorePropertyName, new BsonDocument { { "$meta", "searchScore" } } },
{ documentPropertyName, "$$ROOT" }
}
}
};
}
}
Loading
Loading