Skip to content

Conversation

@adamsitnik
Copy link
Member

@adamsitnik adamsitnik commented Oct 23, 2025

Microsoft Reviewers: Open in CodeFlow

@adamsitnik adamsitnik requested a review from roji October 23, 2025 12:00
@adamsitnik adamsitnik requested a review from a team as a code owner October 23, 2025 12:00
Copilot AI review requested due to automatic review settings October 23, 2025 12:00
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces VectorStoreWriter<T>, a new component that writes ingestion chunks to vector stores using dynamic schema generation. The implementation leverages Microsoft Extensions Vector Data (MEVD) abstractions to support multiple vector store backends.

Key changes:

  • Implements VectorStoreWriter<T> with dynamic schema generation based on chunk metadata
  • Adds support for incremental ingestion (replace existing chunks for a document)
  • Integrates MEVD abstractions with dependencies on Semantic Kernel connectors for testing

Reviewed Changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
src/Libraries/Microsoft.Extensions.DataIngestion/Writers/VectorStoreWriter.cs Core writer implementation with dynamic schema and incremental ingestion support
src/Libraries/Microsoft.Extensions.DataIngestion/Writers/VectorStoreWriterOptions.cs Configuration options for collection name, distance function, and incremental ingestion
test/Libraries/Microsoft.Extensions.DataIngestion.Tests/VectorStoreWriterTests.cs Test suite covering dynamic schema generation and incremental ingestion scenarios
test/Libraries/Microsoft.Extensions.DataIngestion.Tests/Utils/TestEmbeddingGenerator.cs Test utility for mocking embedding generation
test/Libraries/Microsoft.Extensions.DataIngestion.Tests/Utils/IAsyncEnumerableExtensions.cs Temporary async enumerable helpers until .NET 10
src/Libraries/Microsoft.Extensions.DataIngestion/Microsoft.Extensions.DataIngestion.csproj Adds MEVD abstractions package reference
test/Libraries/Microsoft.Extensions.DataIngestion.Tests/Microsoft.Extensions.DataIngestion.Tests.csproj Adds Semantic Kernel connector references for testing
eng/packages/General.props Registers MEVD abstractions package version
eng/packages/TestOnly.props Registers Semantic Kernel connector package versions
eng/Versions.props Defines version properties for new dependencies

Copy link
Member

@roji roji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really nice to see this on top of MEVD. Here's a first review.

General note: this adds a reference to MEVD.Abstractions from MEDI, although at least in theory MEDI may have users which don't ingest into a vector database (after all, that's why VectorStoreWriter is an implementation of an abstraction). I don't know how fine-grainer we want to cut it - we could in theory have MEDI.VectorData for just the VectorData parts, so that non-vector users don't get the MEVD reference, but that makes our packaging quite fine-grained... Thoughts?

@dotnet-policy-service dotnet-policy-service bot added the waiting-author-feedback 📭 The author of this issue needs to respond in order for us to continue investigating this issue. label Oct 23, 2025
@roji
Copy link
Member

roji commented Oct 23, 2025

Ah, I see that @stephentoub reviewed in parallel, sorry for any duplicate comments.

@dotnet-policy-service dotnet-policy-service bot removed the waiting-author-feedback 📭 The author of this issue needs to respond in order for us to continue investigating this issue. label Oct 23, 2025
@adamsitnik adamsitnik requested a review from roji October 23, 2025 17:04
@adamsitnik
Copy link
Member Author

General note: this adds a reference to MEVD.Abstractions from MEDI, although at least in theory MEDI may have users which don't ingest into a vector database (after all, that's why VectorStoreWriter is an implementation of an abstraction). I don't know how fine-grainer we want to cut it - we could in theory have MEDI.VectorData for just the VectorData parts, so that non-vector users don't get the MEVD reference, but that makes our packaging quite fine-grained... Thoughts?

We have MEDI.Abstractions that define an abstract writer class and has no dependencies and MEDI that provides MEVD-based implementation and comes with some dependencies.

@roji
Copy link
Member

roji commented Oct 23, 2025

We have MEDI.Abstractions that define an abstract writer class and has no dependencies and MEDI that provides MEVD-based implementation and comes with some dependencies

OK. It's still a bit odd for (non-Abstractions) MEDI to bring in MEVD, unless we really consider MEDI as very tightly bound to vector databases (which it maybe is, I'm not sure). Possibly a question to keep in mind.

This was referenced Nov 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants