Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Dec 16, 2025

  • Create MediaTypeMap polyfill in LegacySupport directory with System.Net.Mime namespace
  • Add static LoadFromAsync methods to DataContent class:
    • Stream overload that accepts a Stream (infers name and mime type from FileStream's path if applicable)
    • String path overload that infers name and mime type from the file path
    • Pre-initialize MemoryStream capacity when source stream is seekable for performance
  • Add instance SaveToAsync method to DataContent class:
    • Handles directory paths by inferring filename from Name property or using random name
    • Always infers extension from media type if path has none
    • Returns the actual path where data was saved
    • Fix path traversal security issue by using Path.GetFileName(Name)
    • Use FileMode.CreateNew to prevent accidentally overwriting existing files
    • Document IOException when file already exists
  • Remove [Experimental] attribute and add methods to JSON API baseline as stable
  • Use GetBuffer() instead of ToArray() for MemoryStream efficiency
  • Add WriteAllBytesAsync(ReadOnlyMemory<byte>) polyfill to remove #ifdef from SaveToAsync
  • Change return types from Task<T> to ValueTask<T> for all new public async methods
  • Add C# 14 extension member polyfills for File.ReadAllBytesAsync and File.WriteAllBytesAsync in LegacySupport directory
  • Remove name parameter from LoadFromAsync(Stream) - Name can be assigned after the call
  • Add comprehensive unit tests for the new functionality
    • Add test for SaveToAsync with existing file to verify IOException is thrown and original content is preserved
  • Fix net462 build: Add InjectFilePolyfillsOnLegacy to test project
  • Add test for SaveToAsync with empty string path
  • Simplify extension inference by removing unnecessary null check
  • Update MarkItDownMcpReader.cs to use DataContent.LoadFromAsync with optional media type parameter
  • Add Microsoft.Extensions.AI.Abstractions project reference to MarkItDown project for LoadFromAsync access
  • Sync MediaTypeMap with latest .NET runtime source (added missing .cgm/image/cgm mapping)
  • Run code review and security checks
  • Switch components to use MediaTypeMap for extension/MIME type mappings:
    • ImageGeneratorExtensions.cs - Replace _extensionToMimeType dictionary with MediaTypeMap.GetMediaType
    • OpenAIClientExtensions.cs - Replace ImageUriToMediaType with MediaTypeMap.GetMediaType
    • OpenAIImageGenerator.cs - Replace _mimeTypeToExtension dictionary with MediaTypeMap.GetExtension
    • IngestionDocumentReader.cs - Remove custom lookup, only use MediaTypeMap.GetMediaType
Original prompt

This section details on the original issue you should resolve

<issue_title>We should have helper methods / extensions for DataContent</issue_title>
<issue_description>At a minimum we should have Load and Save.

Load should read from a path and preserve the file name loaded and try to decode a mime-type from that.

Save should save to a path, with an optional filename, using Name if a directory is provided in the path.

Perhaps we could consider a filename without extension to be provided and we infer the extension from the mime-type and return the saved path.

We should also have Stream based helpers.</issue_description>

<agent_instructions>Add static LoadFromAsync and instance SaveToAsync methods to DataContent. There should be a string path overload for loading that infers the name and mime type from the path (it should add a polyfill to common for https://github.com/dotnet/runtime/blob/main/src/libraries/System.Net.Mail/src/System/Net/Mime/MediaTypeMap.cs and use that to infer the mime type). There should also be a stream based one that should accept a Stream. If it's actually a FileStream it should default to inferring name and mime type from the filestream's path. It should fall back to octet stream if unknown. SaveAsync should accept a required path to save to; it should accept an optional bool parameter defaulting to true for inferExtension that will add an extension to the provided path if it doesn't have one, based on the mime type and MediaTypemap. It should return the path to which the data was saved. </agent_instructions>

Comments on the Issue (you are @copilot in this section)


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Microsoft Reviewers: Open in CodeFlow

Copilot AI changed the title [WIP] Add helper methods for DataContent with Load and Save Add LoadFromAsync and SaveToAsync helper methods to DataContent Dec 16, 2025
Copilot AI requested a review from stephentoub December 16, 2025 16:20
Copy link
Member

@stephentoub stephentoub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Copilot AI commented Dec 16, 2025

@copilot

All review comments addressed in 840f61c. Changes include: MediaTypeMap moved to LegacySupport with full mappings, methods are now stable (not experimental), DefaultMediaType moved to DataContent as private const, using GetBuffer() for MemoryStream, proper null handling for Name, SaveToAsync handles directory paths and always infers extensions, and NET9_0_OR_GREATER conditional for ReadOnlyMemory overload.

Copilot AI requested a review from stephentoub December 16, 2025 17:15
@stephentoub stephentoub marked this pull request as ready for review December 16, 2025 17:16
@stephentoub stephentoub requested review from a team as code owners December 16, 2025 17:16
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds helper methods to DataContent for loading from and saving to file system paths and streams, with automatic media type (MIME type) inference based on file extensions.

Key Changes

  • Adds static LoadFromAsync methods (path and stream overloads) that infer media types and names from file extensions
  • Adds instance SaveToAsync method that handles directory paths and extension inference
  • Implements MediaTypeMap polyfill for .NET 9 and earlier to enable file extension-to-MIME type mapping
  • Marks the new APIs as stable (non-experimental) in the JSON baseline

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
src/Libraries/Microsoft.Extensions.AI.Abstractions/Contents/DataContent.cs Implements the new LoadFromAsync and SaveToAsync methods with media type inference logic
src/LegacySupport/MediaTypeMap/MediaTypeMap.cs Adds polyfill implementation of media type mapping for pre-.NET 10 frameworks
src/LegacySupport/MediaTypeMap/README.md Documents the MediaTypeMap polyfill purpose and origin
eng/MSBuild/LegacySupport.props Configures MSBuild to inject MediaTypeMap polyfill for legacy target frameworks
src/Libraries/Microsoft.Extensions.AI.Abstractions/Microsoft.Extensions.AI.Abstractions.csproj Enables MediaTypeMap injection for the abstractions project
src/Libraries/Microsoft.Extensions.AI.Abstractions/Microsoft.Extensions.AI.Abstractions.json Adds the new methods to the stable API baseline
test/Libraries/Microsoft.Extensions.AI.Abstractions.Tests/Contents/DataContentTests.cs Adds comprehensive unit tests covering all new functionality and edge cases

Co-authored-by: stephentoub <2642209+stephentoub@users.noreply.github.com>
…o corruption

Co-authored-by: stephentoub <2642209+stephentoub@users.noreply.github.com>
Copilot AI requested a review from stephentoub January 21, 2026 17:26
@stephentoub stephentoub enabled auto-merge (squash) January 21, 2026 17:30
{
private static readonly Dictionary<string, string> s_extensionToMediaType = new(StringComparer.OrdinalIgnoreCase)
{
[".3g2"] = "video/3gpp2",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot, please double check that these mappings are in sync with the current state of https://raw.githubusercontent.com/dotnet/runtime/refs/heads/main/src/libraries/System.Net.Mail/src/System/Net/Mime/MediaTypeMap.cs . It's possible that changed since this PR was created. This should have exactly the same set of entries for both directions of mapping.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 3bf025c. Found one missing entry: .cgm/image/cgm - the runtime has 404 extension mappings and 295 media type mappings, but our polyfill had 403 and 294. Now synced with the current runtime source.

Co-authored-by: stephentoub <2642209+stephentoub@users.noreply.github.com>
auto-merge was automatically disabled January 21, 2026 17:38

Head branch was pushed to by a user without write access

Copilot AI requested a review from stephentoub January 21, 2026 17:39
/// </param>
/// <param name="cancellationToken">The <see cref="CancellationToken"/> to monitor for cancellation requests.</param>
/// <returns>The actual path where the data was saved, which may include an inferred file name and/or extension.</returns>
/// <exception cref="ArgumentNullException"><paramref name="path"/> is <see langword="null"/>.</exception>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worthwhile to document that an IOException will be thrown in the file already exists.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot, please update the XML docs accordingly

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 71efd51. Added <exception cref="IOException">A file already exists at the specified path.</exception> to the XML documentation.

// If path is a directory, infer the file name from the Name property if available,
// or use a random name along with an extension inferred from the media type.
// If the path is empty, treat it as the current directory.
if (path.Length == 0 || Directory.Exists(path))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there might also be a case of the path representing a directory by having a trailing slash, but that directory doesn't yet exist. We should also handle that here (if you accept that we should create the directory).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment about lack of precedent. Do we do that anywhere else?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like most of our lower level APIs do not create the directory, so we can leave it. Let's make sure to test the case where someone calls this API specifying a path to a directory that does not exist, with a parent directory that does exist. Make sure that we don't incidentally trim a trailing slash and create a file with the directory name.

Co-authored-by: stephentoub <2642209+stephentoub@users.noreply.github.com>
Comment on lines +565 to +578
public async Task SaveToAsync_WithEmptyPath_UsesCurrentDirectory()
{
// Empty string path is valid - it uses the current directory with inferred filename.
// We use a unique name to avoid conflicts with concurrent tests.
string tempDir = Path.Combine(Path.GetTempPath(), $"test_empty_path_{Guid.NewGuid()}");
Directory.CreateDirectory(tempDir);
string uniqueName = $"test_{Guid.NewGuid()}.json";

try
{
// Save with empty path from within the temp directory context
// Since we can't change current directory, we test the directory path behavior instead
DataContent content = new(new byte[] { 1, 2, 3 }, "application/json") { Name = uniqueName };
string savedPath = await content.SaveToAsync(tempDir);
Copy link
Member

@ericstj ericstj Jan 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't appear to do what the method name and comment says. It claims to be testing with an empty path / empty-string path, but instead it passes a directory.

I think this test meant to pass in String.Empty to SaveToAsync. It should do that, and then also test passing just a file-name. Both should write to current dir. The test can use the returned path for cleanup.

}

[Fact]
public async Task LoadFromAsync_AbsolutePath_LoadsCorrectly()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test looks redundant to me.

}

[Fact]
public async Task LoadFromAsync_RelativePath_LoadsCorrectly()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test doesn't do what it claims. It's using an absolute temp path.

}

[Fact]
public async Task LoadFromAsync_RelativePathWithSubdirectory_LoadsCorrectly()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also doesn't do what it claims

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-ai Microsoft.Extensions.AI libraries

Projects

None yet

Development

Successfully merging this pull request may close these issues.

We should have helper methods / extensions for DataContent

5 participants