-
Notifications
You must be signed in to change notification settings - Fork 4.6k
.Net: New Feature: VectorStore to provide supported key types #13141
Copy link
Copy link
Closed
Labels
.NETIssue or Pull requests regarding .NET codeIssue or Pull requests regarding .NET codemsft.ext.vectordataRelated to Microsoft.Extensions.VectorDataRelated to Microsoft.Extensions.VectorDataneeds_port_to_pythonIndicate this item needs to also be done for PythonIndicate this item needs to also be done for PythonstaleIssue is stale because it has been open for a while and has no activityIssue is stale because it has been open for a while and has no activity
Metadata
Metadata
Assignees
Labels
.NETIssue or Pull requests regarding .NET codeIssue or Pull requests regarding .NET codemsft.ext.vectordataRelated to Microsoft.Extensions.VectorDataRelated to Microsoft.Extensions.VectorDataneeds_port_to_pythonIndicate this item needs to also be done for PythonIndicate this item needs to also be done for PythonstaleIssue is stale because it has been open for a while and has no activityIssue is stale because it has been open for a while and has no activity
Type
Projects
Status
Backlog
I am currently working on Data Ingestion project, which is more or less an ETL that uses some cloud service to parse the file, some chunkers to split it into chunks, LLMs to extend them with info (keywords/summary etc) and MEVD to store the chunks in the Vector Store (and generate embeddings with MEAI on the fly).
Since the users can specify any number of custom metadata enrichers (add summary, classify, extract keywords), I am using the "dynamic" collections:
https://github.com/adamsitnik/dataingestion/blob/23e97338e0e491598e5b8dc2f7b1690f16309799/src/Microsoft.Extensions.DataIngestion/VectorStoreWriter.cs#L87-L88
And building the definition on the fly:
https://github.com/adamsitnik/dataingestion/blob/23e97338e0e491598e5b8dc2f7b1690f16309799/src/Microsoft.Extensions.DataIngestion/VectorStoreWriter.cs#L118-L152
It works really nice and is easy to use, but I have to ask the users to specify the
TKeyin explicit way:https://github.com/adamsitnik/dataingestion/blob/23e97338e0e491598e5b8dc2f7b1690f16309799/src/Samples/Program.cs#L48
I would like to avoid that (so it's super easy to use and get started), and then choose following key generation strategy:
It would be great if
VectorStorewas capable of exposing information about supported key types. Then I would not need to ask the users to specify that.My API proposal is to extend
VectorStoreMetadatawithIReadOnlyList<Type> SupportedKeyTypesproperty.public class VectorStoreMetadata { public string? VectorStoreSystemName { get; init; } public string? VectorStoreName { get; init; } + pulict IReadOnlyList<Type> SupportedKeyTypes { get; init; } }IReadOnlyList<Type>backed by array should be just enough to quickly check if given type is supported.HashSetwould be better if there was more types, for this case it could just slow down the startup time by compiling another type.cc @roji @westey-m please let me know what do you think, I am more than happy to send a PR.