-
Notifications
You must be signed in to change notification settings - Fork 850
Use Microsoft.Extensions.DataIngestion in AI Chat Web template
#7023
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This reverts commit e1d066034962c9686bf8150984b6adf0e25846c8.
This reverts commit a369be9.
src/Libraries/Microsoft.Extensions.DataIngestion/Writers/VectorStoreWriter.cs
Outdated
Show resolved
Hide resolved
...rc/ChatWithCustomData/ChatWithCustomData-CSharp.Web/Components/Pages/Chat/ChatCitation.razor
Show resolved
Hide resolved
...es/src/ChatWithCustomData/ChatWithCustomData-CSharp.Web/Services/Ingestion/DocumentReader.cs
Show resolved
Hide resolved
....AI.Templates/src/ChatWithCustomData/ChatWithCustomData-CSharp.Web/Services/IngestedChunk.cs
Show resolved
Hide resolved
...AI.Templates/src/ChatWithCustomData/ChatWithCustomData-CSharp.Web/Services/SemanticSearch.cs
Outdated
Show resolved
Hide resolved
src/ProjectTemplates/Microsoft.Extensions.AI.Templates/THIRD-PARTY-NOTICES.TXT
Show resolved
Hide resolved
|
Marking as ready for review to get some eyes on this. Note that there are still pending improvements:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR modernizes the AI chat web templates by replacing the custom PDF ingestion pipeline (using PdfPig) with the new Microsoft.Extensions.DataIngestion library suite. The changes enable support for both Markdown and PDF document formats while simplifying the ingestion architecture.
Key Changes
- Replaced custom
PDFDirectorySourceandIIngestionSourcewith the standardizedMicrosoft.Extensions.DataIngestionAPIs - Removed
IngestedDocumenttracking class as document versioning is now handled by the ingestion pipeline - Added Markdown viewer support (viewer.html and viewer.mjs) for rendering
.mdfiles - Updated citation format to remove page numbers, now supporting document-level citations
- Changed ingestion trigger from startup to lazy initialization on first search request
Reviewed Changes
Copilot reviewed 94 out of 100 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| VectorStoreWriter.cs | Added workaround for QdrantVectorStore key type incompatibility using string name check |
| SemanticSearch.cs (all variants) | Added lazy ingestion on first search with _initialized flag |
| DocumentReader.cs | New custom reader supporting both Markdown and PDF via MarkdownReader and MarkItDownReader |
| DataIngestor.cs (all variants) | Simplified to use IngestionPipeline with SemanticSimilarityChunker |
| IngestedChunk.cs (all variants) | Changed Key type to Guid, made constants public, added JSON serialization attributes |
| ChatCitation.razor | Added Markdown viewer support alongside existing PDF viewer |
| ChatMessageItem.razor | Removed page number from citation regex and data structure |
| Program.cs variants | Removed startup ingestion, added vector store registrations, changed DataIngestor to singleton |
| *.csproj.in | Replaced PdfPig with DataIngestion packages and ML.Tokenizers |
| THIRD-PARTY-NOTICES.TXT | Removed PdfPig license notice |
| GeneratedContent.targets | Updated package version variables |
src/Libraries/Microsoft.Extensions.DataIngestion/Writers/VectorStoreWriter.cs
Outdated
Show resolved
Hide resolved
...AI.Templates/src/ChatWithCustomData/ChatWithCustomData-CSharp.Web/Services/SemanticSearch.cs
Outdated
Show resolved
Hide resolved
...tensions.AI.Templates/src/ChatWithCustomData/ChatWithCustomData-CSharp.Web/Program.Aspire.cs
Show resolved
Hide resolved
jeffhandley
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, @MackinnonBuck!
src/ProjectTemplates/Microsoft.Extensions.AI.Templates/THIRD-PARTY-NOTICES.TXT
Show resolved
Hide resolved
...enAI_Qdrant_Aspire.verified/aichatweb/aichatweb.Web/Components/Pages/Chat/ChatCitation.razor
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Big thanks for a great contribution and detailed testing @MackinnonBuck !
Please enable the tracing, this could be done by modyfing:
Line 79 in c3e0c73
| .AddSource("Experimental.Microsoft.Extensions.AI"); |
with:
.AddSource("Experimental.Microsoft.Extensions.DataIngestion");
src/Libraries/Microsoft.Extensions.DataIngestion/Writers/VectorStoreWriter.cs
Outdated
Show resolved
Hide resolved
....AI.Templates/src/ChatWithCustomData/ChatWithCustomData-CSharp.Web/Services/IngestedChunk.cs
Show resolved
Hide resolved
src/ProjectTemplates/Microsoft.Extensions.AI.Templates/THIRD-PARTY-NOTICES.TXT
Show resolved
Hide resolved
...ts/aichatweb.AzureOpenAI_Qdrant_Aspire.verified/aichatweb/aichatweb.Web/aichatweb.Web.csproj
Show resolved
Hide resolved
...ates/src/ChatWithCustomData/ChatWithCustomData-CSharp.Web/Services/Ingestion/DataIngestor.cs
Outdated
Show resolved
Hide resolved
...ates/src/ChatWithCustomData/ChatWithCustomData-CSharp.Web/Services/Ingestion/DataIngestor.cs
Outdated
Show resolved
Hide resolved
...es/src/ChatWithCustomData/ChatWithCustomData-CSharp.Web/Services/Ingestion/DocumentReader.cs
Show resolved
Hide resolved
|
@MackinnonBuck FYI - I added some commits, including one that shows a message about documents being loaded. All tests are passing and I did a lot of end-to-end functional validation too. I've marked it to auto-merge when CI is green after my latest push. |
adamsitnik
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are almost there, we just need to disable the incremental ingestion and remove the SK dependency from the PDF reader.
...ates/src/ChatWithCustomData/ChatWithCustomData-CSharp.Web/Services/Ingestion/DataIngestor.cs
Outdated
Show resolved
Hide resolved
...ates/src/ChatWithCustomData/ChatWithCustomData-CSharp.Web/Services/Ingestion/PdfPigReader.cs
Outdated
Show resolved
Hide resolved
...ates/src/ChatWithCustomData/ChatWithCustomData-CSharp.Web/Services/Ingestion/PdfPigReader.cs
Outdated
Show resolved
Hide resolved
...src/ChatWithCustomData/ChatWithCustomData-CSharp.Web/ChatWithCustomData-CSharp.Web.csproj.in
Outdated
Show resolved
Hide resolved
...zureOpenAI_Qdrant_Aspire.verified/aichatweb/aichatweb.Web/Services/Ingestion/DataIngestor.cs
Outdated
Show resolved
Hide resolved
.../aichatweb.Ollama_Qdrant.verified/aichatweb/aichatweb.Web/Services/Ingestion/DataIngestor.cs
Outdated
Show resolved
Hide resolved
...ests/Snapshots/aichatweb.Ollama_Qdrant.verified/aichatweb/aichatweb.Web/aichatweb.Web.csproj
Outdated
Show resolved
Hide resolved
...apshots/aichatweb.OpenAI_AzureAISearch.verified/aichatweb/Services/Ingestion/DataIngestor.cs
Outdated
Show resolved
Hide resolved
...apshots/aichatweb.OpenAI_AzureAISearch.verified/aichatweb/Services/Ingestion/PdfPigReader.cs
Outdated
Show resolved
Hide resolved
...ntegrationTests/Snapshots/aichatweb.OpenAI_AzureAISearch.verified/aichatweb/aichatweb.csproj
Outdated
Show resolved
Hide resolved
adamsitnik
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MackinnonBuck To save your time, I've addressed my feedback by pushing to your branch. Please perform the manual verification before merging (I don't know how to do it).
...plates/src/ChatWithCustomData/ChatWithCustomData-CSharp.Web/Components/Pages/Chat/Chat.razor
Show resolved
Hide resolved
…x code formatting.
This PR makes the following changes to the chat template:
Example_GPS_Watch.pdfwith its markdown equivalentPdfPigmarkitdown