Skip to content

Conversation

@adamsitnik
Copy link
Member

@adamsitnik adamsitnik commented Oct 31, 2025

The Pipeline itself is rather simple. The tricky part is exception handling: should the pipeline fail when a processing of a single file has failed? Initially I wanted to introduce something like ContinueOnError, but when I search the repo and found MaximumConsecutiveErrorsPerRequest and got inspired with that idea (for better or worse).

Microsoft Reviewers: Open in CodeFlow

@adamsitnik adamsitnik self-assigned this Oct 31, 2025
Copilot AI review requested due to automatic review settings October 31, 2025 17:26
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds telemetry, logging, and error handling capabilities to the data ingestion pipeline. The key enhancement is a new IngestionPipeline class that orchestrates the document reading, chunking, and writing processes with comprehensive observability support.

  • Introduces IngestionPipeline<T> with Activity tracing and logging support
  • Adds configurable error handling through IngestionPipelineOptions
  • Reorganizes test utilities and namespaces for better organization

Reviewed Changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
IngestionPipeline.cs New pipeline orchestrator with telemetry and error handling
IngestionPipelineOptions.cs Configuration options for pipeline error handling and activity source naming
Log.cs Logging message definitions using source generator pattern
DiagnosticsConstants.cs Activity names and tag constants for telemetry
Microsoft.Extensions.DataIngestion.csproj Adds logging and diagnostics dependencies
IngestionPipelineTests.cs Comprehensive tests for pipeline processing and error scenarios
TestReader.cs New test helper for simulating custom reader behavior
TestEmbeddingGenerator.cs Moved to base namespace for reuse across test namespaces
VectorStoreWriterTests.cs Namespace changed to align with folder structure
SqliteVectorStoreWriterTests.cs Namespace changed to align with folder structure
InMemoryVectorStoreWriterTests.cs Namespace changed to align with folder structure
Microsoft.Extensions.DataIngestion.Tests.csproj Adds OpenTelemetry dependency for telemetry testing
Comments suppressed due to low confidence (1)

test/Libraries/Microsoft.Extensions.DataIngestion.Tests/IngestionPipelineTests.cs:1

  • The condition uses >= but should use > when comparing with MaximumErrorsPerProcessing. With the current logic, if MaximumErrorsPerProcessing is set to 3, processing will stop after 3 errors (indices 0,1,2), but the documentation states 'maximum number of ingestions that are allowed to fail', which should allow 3 failures before throwing on the 4th.
// Licensed to the .NET Foundation under one or more agreements.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 12 out of 12 changed files in this pull request and generated 1 comment.

# Conflicts:
#	src/Libraries/Microsoft.Extensions.DataIngestion/Microsoft.Extensions.DataIngestion.csproj
…g a single DocumentProcessors

make Pipeline return a collection of the results, so the users can handle everything as they want to
@adamsitnik adamsitnik merged commit 0eb69ee into dotnet:main Nov 5, 2025
6 checks passed
@adamsitnik adamsitnik deleted the pipelines branch November 5, 2025 11:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants