-
Notifications
You must be signed in to change notification settings - Fork 4.4k
.Net MEVD: LINQ-based filtering #10273
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
43bc48e
283d581
1c893c6
64fd77b
d4dc969
161ee58
eb07d55
de13cff
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,5 +1,11 @@ | ||
| // Copyright (c) Microsoft. All rights reserved. | ||
|
|
||
| // TODO: Commented out as part of implementing LINQ-based filtering, since MappingVectorStoreRecordCollection is no longer easy/feasible. | ||
roji marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| // TODO: The user provides an expression tree accepting a TPublicRecord, but we require an expression tree accepting a TInternalRecord. | ||
| // TODO: This is something that the user must provide, and is quite advanced. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is the plan to get this checked into the feature branch so long, and then address this area after, or should we keep an eye on this during this PR?
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @westey-m I'm not sure... Importantly, this isn't just "let's defer this for a bit" - the moment users start expressing filter in terms of model (and not storage) properties, any sort of transformation/mapping becomes problematic (as we're discussed before)... I can imagine an API which allows users to map from model properties (referenced in the expression tree) to arbitrary storage properties, which would plug into the filter translators. For flat record that's easy enough, but for hierarchical ones things could get a bit more complicated... My personal inclination here is to defer these advanced/custom mapping scenarios - as a whole - until after the initial release, and then take the time to add the proper API... I may be wrong, but IMHO these scenarios shouldn't be part of the everyday user scenario, and we should be able to add support later without breaking. But we should probably discuss this and prioritize.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm concerned that we are abandoning anyone who is already using advanced mapping until we solve this problem, especially if we remove the legacy filtering, which we said we would do before the initial release. Definitely something we should discuss more.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As discussed, let me dig into the use cases for custom mapping more. If as suspected it's really a workaround for the lack of hierarchical model support, we should not support custom mapping, and add hierarchical model support. |
||
|
|
||
| #if DISABLED | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should it be
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same as above |
||
|
|
||
| using System.Runtime.CompilerServices; | ||
| using Microsoft.Extensions.VectorData; | ||
|
|
||
|
|
@@ -132,3 +138,5 @@ public async Task<VectorSearchResults<TPublicRecord>> VectorizedSearchAsync<TVec | |
| }; | ||
| } | ||
| } | ||
|
|
||
| #endif | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -76,6 +76,7 @@ public IVectorStoreRecordCollection<TKey, TRecord> CreateVectorStoreRecordCollec | |
| return (collection as IVectorStoreRecordCollection<TKey, TRecord>)!; | ||
| } | ||
|
|
||
| #if DISABLED_FOR_NOW // TODO: See note on MappingVectorStoreRecordCollection | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should it not be
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Well this basically is just a glorified comment (I did it with #if instead of /* */ mostly because the latter doesn't nest). If I negate, we'd have to define DISABLED_FOR_NOW in the csproj for this to be commented out... |
||
| // If the user asked for a string key, we can add a decorator which converts back and forth between string and guid. | ||
| // The string that the user provides will still need to contain a valid guid, since the Langchain created collection | ||
| // uses guid keys. | ||
|
|
@@ -92,6 +93,7 @@ public IVectorStoreRecordCollection<TKey, TRecord> CreateVectorStoreRecordCollec | |
|
|
||
| return (stringKeyCollection as IVectorStoreRecordCollection<TKey, TRecord>)!; | ||
| } | ||
| #endif | ||
|
|
||
| throw new NotSupportedException("This VectorStore is only usable with Guid keys and LangchainDocument<Guid> record types or string keys and LangchainDocument<string> record types"); | ||
| } | ||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am very new to this repo, so please take it with a grain of salt.
Currently we do have
srcandsamplesfolders insideroot/dotnet. Sincesrcdoes not cotain all the source code (samples), I believe we could similarly to other .NET repos add just atestfolder with all the test projects.Moreover, using a word
IntegrationTestssuggests that these are only integration tests. What if I want to add a single class for the unit test? Should I create a dedicated folder for it?Last, but not least, we could keep the Connectors folder structure in the test project:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I really like the idea of having dedicated test projects! 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that moving all tests to a separate
testfolder outside ofsrcwould make sense and follow typical .NET project structure - but I'll leave that larger question out of this PR (/cc @markwallace-microsoft).There currently are also unit test projects for each connectors as well (see e.g. SemanticKernel.Connectors.AzureAISearch.UnitTests); I personally think unit tests for these vector database connectors are less valuable (see #10464), and that we should focus our efforts on good, reliable and reusable integration tests.
But regardless, I agree that if we do have unit tests, separating unit and integration tests to different projects should ideally not be needed. I'm assuming the reason for the split is that integration tests are considered flaky (whereas unit tests are always 100% reliable), so the split allows us to e.g. at least run unit tests in CI. But I believe we should work on making the integration tests reliable (it should be achievable), at which point we can just merge and have a single "tests" project for both integration and unit tests. Another possible reason is running speed (unit tests run faster since they don't access a real database), but I don't think that should be a concern here.
I propose we continue in the direction of reusable integration tests as in this PR (#10194); once that's done, if everything works well and reliably, we can raise the question of what to do with the tests in general (merge integration and unit tests, get rid of the unit tests altogether...).
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a couple of extra historical reasons for the split of unit tests vs integration tests: