-
Notifications
You must be signed in to change notification settings - Fork 841
Introduce IngestionChunkWriter build on top of MEVD #6951
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces VectorStoreWriter<T>, a new component that writes ingestion chunks to vector stores using dynamic schema generation. The implementation leverages Microsoft Extensions Vector Data (MEVD) abstractions to support multiple vector store backends.
Key changes:
- Implements
VectorStoreWriter<T>with dynamic schema generation based on chunk metadata - Adds support for incremental ingestion (replace existing chunks for a document)
- Integrates MEVD abstractions with dependencies on Semantic Kernel connectors for testing
Reviewed Changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| src/Libraries/Microsoft.Extensions.DataIngestion/Writers/VectorStoreWriter.cs | Core writer implementation with dynamic schema and incremental ingestion support |
| src/Libraries/Microsoft.Extensions.DataIngestion/Writers/VectorStoreWriterOptions.cs | Configuration options for collection name, distance function, and incremental ingestion |
| test/Libraries/Microsoft.Extensions.DataIngestion.Tests/VectorStoreWriterTests.cs | Test suite covering dynamic schema generation and incremental ingestion scenarios |
| test/Libraries/Microsoft.Extensions.DataIngestion.Tests/Utils/TestEmbeddingGenerator.cs | Test utility for mocking embedding generation |
| test/Libraries/Microsoft.Extensions.DataIngestion.Tests/Utils/IAsyncEnumerableExtensions.cs | Temporary async enumerable helpers until .NET 10 |
| src/Libraries/Microsoft.Extensions.DataIngestion/Microsoft.Extensions.DataIngestion.csproj | Adds MEVD abstractions package reference |
| test/Libraries/Microsoft.Extensions.DataIngestion.Tests/Microsoft.Extensions.DataIngestion.Tests.csproj | Adds Semantic Kernel connector references for testing |
| eng/packages/General.props | Registers MEVD abstractions package version |
| eng/packages/TestOnly.props | Registers Semantic Kernel connector package versions |
| eng/Versions.props | Defines version properties for new dependencies |
src/Libraries/Microsoft.Extensions.DataIngestion/Writers/VectorStoreWriter.cs
Outdated
Show resolved
Hide resolved
src/Libraries/Microsoft.Extensions.DataIngestion/Writers/VectorStoreWriter.cs
Outdated
Show resolved
Hide resolved
src/Libraries/Microsoft.Extensions.DataIngestion/Writers/VectorStoreWriter.cs
Show resolved
Hide resolved
src/Libraries/Microsoft.Extensions.DataIngestion/Writers/VectorStoreWriter.cs
Outdated
Show resolved
Hide resolved
src/Libraries/Microsoft.Extensions.DataIngestion/Writers/VectorStoreWriter.cs
Outdated
Show resolved
Hide resolved
src/Libraries/Microsoft.Extensions.DataIngestion/Writers/VectorStoreWriter.cs
Outdated
Show resolved
Hide resolved
src/Libraries/Microsoft.Extensions.DataIngestion/Writers/VectorStoreWriterOptions.cs
Outdated
Show resolved
Hide resolved
src/Libraries/Microsoft.Extensions.DataIngestion/Writers/VectorStoreWriterOptions.cs
Outdated
Show resolved
Hide resolved
test/Libraries/Microsoft.Extensions.DataIngestion.Tests/Utils/IAsyncEnumerableExtensions.cs
Show resolved
Hide resolved
test/Libraries/Microsoft.Extensions.DataIngestion.Tests/VectorStoreWriterTests.cs
Show resolved
Hide resolved
roji
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really nice to see this on top of MEVD. Here's a first review.
General note: this adds a reference to MEVD.Abstractions from MEDI, although at least in theory MEDI may have users which don't ingest into a vector database (after all, that's why VectorStoreWriter is an implementation of an abstraction). I don't know how fine-grainer we want to cut it - we could in theory have MEDI.VectorData for just the VectorData parts, so that non-vector users don't get the MEVD reference, but that makes our packaging quite fine-grained... Thoughts?
src/Libraries/Microsoft.Extensions.DataIngestion/Writers/VectorStoreWriter.cs
Show resolved
Hide resolved
src/Libraries/Microsoft.Extensions.DataIngestion/Writers/VectorStoreWriter.cs
Show resolved
Hide resolved
src/Libraries/Microsoft.Extensions.DataIngestion/Writers/VectorStoreWriter.cs
Show resolved
Hide resolved
src/Libraries/Microsoft.Extensions.DataIngestion/Writers/VectorStoreWriter.cs
Show resolved
Hide resolved
src/Libraries/Microsoft.Extensions.DataIngestion/Writers/VectorStoreWriter.cs
Show resolved
Hide resolved
src/Libraries/Microsoft.Extensions.DataIngestion/Writers/VectorStoreWriterOptions.cs
Outdated
Show resolved
Hide resolved
test/Libraries/Microsoft.Extensions.DataIngestion.Tests/VectorStoreWriterTests.cs
Outdated
Show resolved
Hide resolved
test/Libraries/Microsoft.Extensions.DataIngestion.Tests/VectorStoreWriterTests.cs
Outdated
Show resolved
Hide resolved
test/Libraries/Microsoft.Extensions.DataIngestion.Tests/VectorStoreWriterTests.cs
Outdated
Show resolved
Hide resolved
|
Ah, I see that @stephentoub reviewed in parallel, sorry for any duplicate comments. |
src/Libraries/Microsoft.Extensions.DataIngestion/Writers/VectorStoreWriter.cs
Show resolved
Hide resolved
We have |
OK. It's still a bit odd for (non-Abstractions) MEDI to bring in MEVD, unless we really consider MEDI as very tightly bound to vector databases (which it maybe is, I'm not sure). Possibly a question to keep in mind. |
Microsoft Reviewers: Open in CodeFlow