-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Preserve file encoding in file-based app source files #52055
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- Modified SourceFile.Load() to detect UTF-8 BOM in files - Modified SourceFile.Save() to preserve original BOM state - Added HasUtf8Bom property to SourceFile struct - Updated InternalAPI.Unshipped.txt with new API members - Added tests to verify BOM preservation (no-BOM and with-BOM cases) Fixes issue where dotnet add package command was adding BOM to files that originally didn't have one, breaking shebang scripts. Co-authored-by: DamianEdwards <249088+DamianEdwards@users.noreply.github.com>
src/Cli/Microsoft.DotNet.FileBasedPrograms/FileLevelDirectiveHelpers.cs
Outdated
Show resolved
Hide resolved
jjonescz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
src/Cli/Microsoft.DotNet.FileBasedPrograms/FileLevelDirectiveHelpers.cs
Outdated
Show resolved
Hide resolved
test/dotnet.Tests/CommandTests/Run/FileBasedAppSourceEditorTests.cs
Outdated
Show resolved
Hide resolved
- Use SourceText's Encoding property instead of custom HasUtf8Bom tracking - Use stackalloc for 3-byte buffer on .NET Core (with fallback for netstandard2.0) - Update tests to use _testAssetsManager.CreateTestDirectory() infrastructure - Add issue reference (#52054) to test docs - Remove unnecessary API surface from InternalAPI.Unshipped.txt Co-authored-by: jjonescz <3669664+jjonescz@users.noreply.github.com>
jjonescz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
src/Cli/Microsoft.DotNet.FileBasedPrograms/FileLevelDirectiveHelpers.cs
Outdated
Show resolved
Hide resolved
- Remove custom DetectEncoding() method and use SourceText.From(stream, encoding: null) - SourceText automatically detects encoding including BOM when encoding is null - Move <see href/> tags inside <summary> elements for better documentation - Simplifies implementation while maintaining full functionality Co-authored-by: jjonescz <3669664+jjonescz@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR fixes a critical bug where dotnet add file.cs package <package> was inadvertently adding a UTF-8 BOM to file-based app source files, breaking shebang (#!) scripts on Unix-like systems. The fix preserves the original encoding state (with or without BOM) by leveraging Roslyn's SourceText encoding detection and preservation capabilities.
Key Changes:
- Modified
SourceFile.Load()to auto-detect encoding by passingencoding: nulltoSourceText.From() - Updated
SourceFile.Save()to use the detected encoding fromSourceText.Encodingproperty - Added comprehensive tests verifying BOM preservation for both cases (with and without BOM)
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| src/Cli/Microsoft.DotNet.FileBasedPrograms/FileLevelDirectiveHelpers.cs | Updated SourceFile.Load() to auto-detect encoding and SourceFile.Save() to preserve it using SourceText.Encoding |
| test/dotnet.Tests/CommandTests/Run/FileBasedAppSourceEditorTests.cs | Added two test cases: PreservesNoBomEncoding() for files without BOM and PreservesBomEncoding() for files with BOM |
| var bytes = File.ReadAllBytes(tempFile); | ||
| Assert.True(bytes.Length >= 3 && bytes[0] == 0xEF && bytes[1] == 0xBB && bytes[2] == 0xBF, | ||
| "File should have UTF-8 BOM"); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to verify the behavior when original file uses some other encoding besides UTF-8.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added in 643569f. The new PreservesNonUtf8Encoding test verifies that UTF-16 LE encoding (with BOM 0xFF 0xFE) is preserved correctly. SourceText's auto-detection handles various encodings, not just UTF-8.
- Add PreservesNonUtf8Encoding test to verify UTF-16 encoding is preserved - Tests that files with UTF-16 LE encoding (with BOM 0xFF 0xFE) maintain their encoding - Ensures SourceText's auto-detection works for encodings beyond UTF-8 - All encoding preservation tests now pass (UTF-8 no BOM, UTF-8 with BOM, UTF-16) Co-authored-by: jjonescz <3669664+jjonescz@users.noreply.github.com>
333fred
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few minor comments on the tests.
test/dotnet.Tests/CommandTests/Run/FileBasedAppSourceEditorTests.cs
Outdated
Show resolved
Hide resolved
test/dotnet.Tests/CommandTests/Run/FileBasedAppSourceEditorTests.cs
Outdated
Show resolved
Hide resolved
test/dotnet.Tests/CommandTests/Run/FileBasedAppSourceEditorTests.cs
Outdated
Show resolved
Hide resolved
Co-authored-by: Fred Silberberg <frsilb@microsoft.com>
dotnet add file.cs packagewas adding UTF-8 BOM to files that originally lacked one, breaking shebang scripts on Unix systems.SourceFile.Save()usedEncoding.UTF8which unconditionally emits BOM.Changes Made
SourceFile.Load(): UsesSourceText.From(stream, encoding: null)to auto-detect the original file's encoding (including BOM detection)SourceFile.Save(): Preserves the detected encoding fromSourceText.Encodingproperty when writing filesPreservesNoBomEncoding(): Verifies files without UTF-8 BOM don't get one added (critical for shebang scripts)PreservesBomEncoding(): Verifies files with UTF-8 BOM preserve itPreservesNonUtf8Encoding(): Verifies non-UTF-8 encodings like UTF-16 are preservedImplementation
The solution leverages Roslyn's
SourceTextAPI for encoding detection and preservation:Example
Before: shebang script becomes unusable after adding package
After: BOM state and encoding preserved
Testing
Original prompt
💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.