Skip to content

Utf16.IsValid and Utf8/16.IndexOfInvalidSubsequence#120326

Open
lilinus wants to merge 17 commits intodotnet:mainfrom
lilinus:utf8-utf16-validation
Open

Utf16.IsValid and Utf8/16.IndexOfInvalidSubsequence#120326
lilinus wants to merge 17 commits intodotnet:mainfrom
lilinus:utf8-utf16-validation

Conversation

@lilinus
Copy link
Contributor

@lilinus lilinus commented Oct 2, 2025

Fixes #118018
Fixes #118113

Copilot AI review requested due to automatic review settings October 2, 2025 14:24
@github-actions github-actions bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Oct 2, 2025
@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Oct 2, 2025
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds new UTF-16 validation APIs to System.Text.Unicode, introducing Utf16.IsValid and Utf16.IndexOfInvalidSubsequence methods alongside similar functionality for UTF-8. It also refactors existing code to use these new standardized validation APIs instead of custom validation logic.

  • Adds new Utf16 class with validation methods IsValid and IndexOfInvalidSubsequence
  • Adds Utf8.IndexOfInvalidSubsequence method to complement existing Utf8.IsValid
  • Refactors multiple codebases to use the new standardized validation APIs

Reviewed Changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
System.Private.CoreLib.Shared.projitems Adds the new Utf16.cs file to the build
Utf16.cs New class implementing UTF-16 validation methods
Utf8.cs Adds IndexOfInvalidSubsequence method and updates class documentation
System.Runtime.cs Adds public API surface for new UTF-16 and UTF-8 validation methods
HttpEncoder.cs Refactors surrogate validation to use new Utf16.IndexOfInvalidSubsequence
Normalization.Icu.cs Replaces custom validation logic with Utf16.IsValid
StringSearchValues.cs Simplifies surrogate validation using Utf16.IsValid
Utf8UtilityTests.ValidateBytes.cs Adds test coverage for new Utf8.IndexOfInvalidSubsequence method
Utf16UtilityTests.ValidateChars.cs Adds test coverage for new Utf16 validation methods

@dotnet-policy-service
Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-runtime
See info in area-owners.md if you want to be subscribed.

@jeffhandley jeffhandley removed the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Jan 14, 2026
Copilot AI review requested due to automatic review settings February 1, 2026 18:58
@jeffhandley
Copy link
Member

@lilinus Thanks for your patience on this; we are planning to review it this month. I've updated the branch and CI will re-run, but there was an ApiCompat error in the build previously indicating Utf8.IndexOfInvalidSubsequence wasn't available in the net8.0 target and it wasn't obvious to me why that was the case.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 4 comments.

Update XML doc comments and remove unneeded using statements.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings February 1, 2026 19:32
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.

…/Text/Unicode/Utf8UtilityTests.ValidateBytes.cs

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings February 4, 2026 11:26
Remove unnecessary blank line at the beginning of Utf16.cs
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 1 comment.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings February 16, 2026 04:29
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 1 comment.

Copy link
Member

@jozkee jozkee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than the unsafe comment, this LGTM.

Is worth mentioning that

private static bool HasInvalidUnicodeSequence(ReadOnlySpan<char> s)
from #118018 can't adopt the new APIs because it would make it more expensive by requiring two passes instead of one.

Copilot AI review requested due to automatic review settings February 27, 2026 09:32
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated 2 comments.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings February 27, 2026 09:45
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 9 out of 9 changed files in this pull request and generated no new comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-System.Runtime community-contribution Indicates that the PR has been added by a community member

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[API Proposal]: Indexed version of Utf8.IsValid [API Proposal]: UTF-16 version of Utf8.IsValid

6 participants