Skip to content

Added Utf8.IsValid(bytes)#88004

Merged
stephentoub merged 6 commits intodotnet:mainfrom
AlexRadch:Utf8.IsValid
Jun 28, 2023
Merged

Added Utf8.IsValid(bytes)#88004
stephentoub merged 6 commits intodotnet:mainfrom
AlexRadch:Utf8.IsValid

Conversation

@AlexRadch
Copy link
Contributor

Close #502 issue.

@ghost ghost added needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners new-api-needs-documentation labels Jun 24, 2023
@ghost
Copy link

ghost commented Jun 24, 2023

Note regarding the new-api-needs-documentation label:

This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change.

@ghost ghost added the community-contribution Indicates that the PR has been added by a community member label Jun 24, 2023
public static System.Buffers.OperationStatus ToUtf16(System.ReadOnlySpan<byte> source, System.Span<char> destination, out int bytesRead, out int charsWritten, bool replaceInvalidSequences = true, bool isFinalBlock = true) { throw null; }
public static bool TryWrite(System.Span<byte> destination, [System.Runtime.CompilerServices.InterpolatedStringHandlerArgumentAttribute("destination")] ref System.Text.Unicode.Utf8.TryWriteInterpolatedStringHandler handler, out int bytesWritten) { throw null; }
public static bool TryWrite(System.Span<byte> destination, IFormatProvider? provider, [System.Runtime.CompilerServices.InterpolatedStringHandlerArgumentAttribute("destination", "provider")] ref System.Text.Unicode.Utf8.TryWriteInterpolatedStringHandler handler, out int bytesWritten) { throw null; }
public static bool IsValid(System.ReadOnlySpan<byte> value) { throw null; }
Copy link
Member

@stephentoub stephentoub Jun 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also search around dotnet/runtime for places this can be used? For example, System.Text.Json currently does this with a public static unsafe bool IsValidUtf8String(ReadOnlySpan<byte> bytes) method that can be entirely replaced when targeting .NET 8+. (Actually, looks like it even has two different helpers for the same thing, with another one public static void ValidateUtf8(ReadOnlySpan<byte> utf8Buffer)).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One useful pattern to look for is for call sites that invoke UTF8Encoding.GetCharCount and throw away the return value. Those call sites are probably calling GetCharCount just for its side effect of throwing an exception when it sees invalid data.

Another might be to look for places which catch EncoderFallbackException and to see if they're actually trying to convert bytes <-> chars or if they're just performing validation.

@jeffhandley jeffhandley added area-System.Runtime and removed needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners labels Jun 27, 2023
@ghost
Copy link

ghost commented Jun 27, 2023

Tagging subscribers to this area: @dotnet/area-system-runtime
See info in area-owners.md if you want to be subscribed.

Issue Details

Close #502 issue.

Author: AlexRadch
Assignees: -
Labels:

area-System.Runtime, new-api-needs-documentation, community-contribution

Milestone: -

@GrabYourPitchforks
Copy link
Member

Thanks @AlexRadch! The logic LGTM, but it would also be interesting to check what Steve mentioned at #88004 (comment). It would probably be better to get any such changes into this PR rather than into a cleanup PR.

@AlexRadch
Copy link
Contributor Author

Thanks @AlexRadch! The logic LGTM, but it would also be interesting to check what Steve mentioned at #88004 (comment). It would probably be better to get any such changes into this PR rather than into a cleanup PR.

I'll be busy for about a week and then I'll finish.

@stephentoub
Copy link
Member

I'll be busy for about a week and then I'll finish.

Thanks for your efforts here. In that case, I'll go ahead and merge this and will handle fixing the most obvious consumption sites. If you have time to look for more after that, great.

@stephentoub stephentoub merged commit 336e9d6 into dotnet:main Jun 28, 2023
@AlexRadch AlexRadch deleted the Utf8.IsValid branch July 8, 2023 21:30
@ghost ghost locked as resolved and limited conversation to collaborators Aug 8, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

area-System.Runtime community-contribution Indicates that the PR has been added by a community member new-api-needs-documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

API proposal: Utf8.IsValid(bytes)

4 participants