Skip to content

Remove custom Utf8 encoder implementation #77400

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

eiriktsarpalis
Copy link
Member

@eiriktsarpalis eiriktsarpalis commented Oct 24, 2022

Second attempt at replacing JsonWriterHelper.Transcoding.cs

Note that this PR changes Utf8JsonWriter.WriteCommentValue handling of inputs that result in invalid UTF-8 outputs, so technically this introduces a breaking change cc @GrabYourPitchforks @bartonjs

Fix #75779

Change is recording hefty performance improvements in the string writing benchmarks:

Method Job Toolchain Formatted SkipValidation Escaped Mean Error StdDev Median Min Max Ratio MannWhitney(3%) RatioSD Allocated Alloc Ratio
WriteStringsUtf8 Job-NTULTU PR False False AllEscaped 56.326 ms 1.5872 ms 1.6983 ms 56.237 ms 53.078 ms 59.281 ms 0.82 Faster 0.11 288 B 0.36
WriteStringsUtf8 Job-CQHRFM main False False AllEscaped 69.793 ms 9.3918 ms 10.4389 ms 65.374 ms 58.983 ms 91.153 ms 1.00 Base 0.00 792 B 1.00
WriteStringsUtf16 Job-NTULTU PR False False AllEscaped 58.697 ms 0.9348 ms 1.0002 ms 58.612 ms 57.264 ms 61.068 ms 0.87 Faster 0.03 204 B 1.00
WriteStringsUtf16 Job-CQHRFM main False False AllEscaped 67.537 ms 1.6138 ms 1.8585 ms 67.023 ms 65.310 ms 71.386 ms 1.00 Base 0.00 204 B 1.00
WriteStringsUtf8 Job-NTULTU PR False False OneEscaped 9.542 ms 0.2926 ms 0.3370 ms 9.533 ms 8.804 ms 10.207 ms 1.00 Same 0.06 146 B 1.01
WriteStringsUtf8 Job-CQHRFM main False False OneEscaped 9.586 ms 0.3599 ms 0.3851 ms 9.478 ms 8.997 ms 10.450 ms 1.00 Base 0.00 145 B 1.00
WriteStringsUtf16 Job-NTULTU PR False False OneEscaped 11.683 ms 0.5056 ms 0.5822 ms 11.635 ms 10.947 ms 12.640 ms 0.81 Faster 0.04 135 B 0.66
WriteStringsUtf16 Job-CQHRFM main False False OneEscaped 14.558 ms 0.1864 ms 0.1455 ms 14.599 ms 14.240 ms 14.742 ms 1.00 Base 0.00 204 B 1.00
WriteStringsUtf8 Job-NTULTU PR False False NoneEscaped 5.724 ms 0.1774 ms 0.2042 ms 5.652 ms 5.357 ms 6.142 ms 0.98 Same 0.05 128 B 1.00
WriteStringsUtf8 Job-CQHRFM main False False NoneEscaped 5.837 ms 0.1783 ms 0.1981 ms 5.779 ms 5.534 ms 6.296 ms 1.00 Base 0.00 128 B 1.00
WriteStringsUtf16 Job-NTULTU PR False False NoneEscaped 6.957 ms 0.2194 ms 0.2439 ms 6.951 ms 6.506 ms 7.452 ms 0.80 Faster 0.05 130 B 0.83
WriteStringsUtf16 Job-CQHRFM main False False NoneEscaped 8.759 ms 0.4588 ms 0.4506 ms 8.873 ms 7.921 ms 9.276 ms 1.00 Base 0.00 157 B 1.00
WriteStringsUtf8 Job-NTULTU PR False True AllEscaped 56.525 ms 1.6862 ms 1.8742 ms 57.179 ms 53.534 ms 59.618 ms 1.05 Same 0.03 288 B 1.00
WriteStringsUtf8 Job-CQHRFM main False True AllEscaped 54.444 ms 0.2451 ms 0.2292 ms 54.343 ms 54.198 ms 54.964 ms 1.00 Base 0.00 288 B 1.00
WriteStringsUtf16 Job-NTULTU PR False True AllEscaped 56.347 ms 1.5188 ms 1.6881 ms 56.354 ms 54.185 ms 60.202 ms 0.87 Faster 0.03 204 B 1.00
WriteStringsUtf16 Job-CQHRFM main False True AllEscaped 64.721 ms 0.3498 ms 0.2921 ms 64.585 ms 64.372 ms 65.278 ms 1.00 Base 0.00 204 B 1.00
WriteStringsUtf8 Job-NTULTU PR False True OneEscaped 9.455 ms 0.0964 ms 0.0805 ms 9.439 ms 9.377 ms 9.671 ms 1.00 Same 0.04 146 B 1.01
WriteStringsUtf8 Job-CQHRFM main False True OneEscaped 9.542 ms 0.2813 ms 0.3239 ms 9.525 ms 8.935 ms 10.076 ms 1.00 Base 0.00 145 B 1.00
WriteStringsUtf16 Job-NTULTU PR False True OneEscaped 11.219 ms 0.1790 ms 0.1495 ms 11.175 ms 11.020 ms 11.589 ms 0.74 Faster 0.10 209 B 1.02
WriteStringsUtf16 Job-CQHRFM main False True OneEscaped 15.261 ms 1.7612 ms 1.8845 ms 14.522 ms 11.866 ms 19.545 ms 1.00 Base 0.00 204 B 1.00
WriteStringsUtf8 Job-NTULTU PR False True NoneEscaped 5.660 ms 0.1538 ms 0.1771 ms 5.623 ms 5.394 ms 6.076 ms 0.98 Same 0.05 128 B 1.00
WriteStringsUtf8 Job-CQHRFM main False True NoneEscaped 5.769 ms 0.1748 ms 0.2013 ms 5.762 ms 5.423 ms 6.203 ms 1.00 Base 0.00 128 B 1.00
WriteStringsUtf16 Job-NTULTU PR False True NoneEscaped 6.986 ms 0.2216 ms 0.2552 ms 6.894 ms 6.660 ms 7.488 ms 0.89 Faster 0.04 129 B 0.99
WriteStringsUtf16 Job-CQHRFM main False True NoneEscaped 7.814 ms 0.1734 ms 0.1997 ms 7.763 ms 7.475 ms 8.223 ms 1.00 Base 0.00 130 B 1.00
WriteStringsUtf8 Job-NTULTU PR True False AllEscaped 70.863 ms 9.9943 ms 11.1086 ms 66.118 ms 55.295 ms 87.093 ms 1.12 Same 0.18 792 B 1.00
WriteStringsUtf8 Job-CQHRFM main True False AllEscaped 64.374 ms 4.1746 ms 4.1001 ms 65.589 ms 55.910 ms 72.994 ms 1.00 Base 0.00 792 B 1.00
WriteStringsUtf16 Job-NTULTU PR True False AllEscaped 58.900 ms 1.6459 ms 1.8294 ms 58.194 ms 57.120 ms 63.192 ms 0.89 Faster 0.03 204 B 1.00
WriteStringsUtf16 Job-CQHRFM main True False AllEscaped 66.430 ms 1.2985 ms 1.0843 ms 66.532 ms 64.182 ms 68.164 ms 1.00 Base 0.00 204 B 1.00
WriteStringsUtf8 Job-NTULTU PR True False OneEscaped 9.945 ms 0.2418 ms 0.2785 ms 9.880 ms 9.518 ms 10.587 ms 1.00 Same 0.04 147 B 1.00
WriteStringsUtf8 Job-CQHRFM main True False OneEscaped 9.942 ms 0.2857 ms 0.3176 ms 9.955 ms 9.362 ms 10.567 ms 1.00 Base 0.00 147 B 1.00
WriteStringsUtf16 Job-NTULTU PR True False OneEscaped 12.914 ms 0.8791 ms 0.9028 ms 13.012 ms 11.257 ms 15.212 ms 1.00 Same 0.08 176 B 1.28
WriteStringsUtf16 Job-CQHRFM main True False OneEscaped 12.977 ms 0.4030 ms 0.4480 ms 12.784 ms 12.320 ms 14.022 ms 1.00 Base 0.00 138 B 1.00
WriteStringsUtf8 Job-NTULTU PR True False NoneEscaped 6.220 ms 0.2375 ms 0.2735 ms 6.174 ms 5.790 ms 6.724 ms 1.01 Same 0.05 128 B 1.00
WriteStringsUtf8 Job-CQHRFM main True False NoneEscaped 6.155 ms 0.1964 ms 0.2262 ms 6.116 ms 5.827 ms 6.634 ms 1.00 Base 0.00 128 B 1.00
WriteStringsUtf16 Job-NTULTU PR True False NoneEscaped 7.666 ms 0.2076 ms 0.2308 ms 7.588 ms 7.288 ms 8.153 ms 0.82 Faster 0.04 130 B 0.80
WriteStringsUtf16 Job-CQHRFM main True False NoneEscaped 9.361 ms 0.3644 ms 0.3899 ms 9.518 ms 8.474 ms 9.692 ms 1.00 Base 0.00 162 B 1.00
WriteStringsUtf8 Job-NTULTU PR True True AllEscaped 60.778 ms 1.8106 ms 2.0851 ms 61.021 ms 57.021 ms 64.734 ms 0.88 Faster 0.11 288 B 0.36
WriteStringsUtf8 Job-CQHRFM main True True AllEscaped 69.838 ms 8.5386 ms 8.7685 ms 65.883 ms 57.595 ms 91.566 ms 1.00 Base 0.00 792 B 1.00
WriteStringsUtf16 Job-NTULTU PR True True AllEscaped 57.080 ms 1.1841 ms 1.3161 ms 57.177 ms 53.863 ms 59.877 ms 0.84 Faster 0.03 204 B 1.00
WriteStringsUtf16 Job-CQHRFM main True True AllEscaped 67.795 ms 1.8355 ms 1.9640 ms 67.560 ms 64.798 ms 71.460 ms 1.00 Base 0.00 204 B 1.00
WriteStringsUtf8 Job-NTULTU PR True True OneEscaped 10.071 ms 0.2033 ms 0.2259 ms 10.009 ms 9.589 ms 10.456 ms 1.01 Same 0.04 151 B 1.03
WriteStringsUtf8 Job-CQHRFM main True True OneEscaped 10.006 ms 0.3025 ms 0.3483 ms 9.964 ms 9.404 ms 10.673 ms 1.00 Base 0.00 147 B 1.00
WriteStringsUtf16 Job-NTULTU PR True True OneEscaped 13.920 ms 1.3133 ms 1.4597 ms 13.555 ms 11.274 ms 17.488 ms 0.84 Faster 0.23 176 B 0.86
WriteStringsUtf16 Job-CQHRFM main True True OneEscaped 17.684 ms 4.3538 ms 5.0138 ms 15.025 ms 12.717 ms 27.852 ms 1.00 Base 0.00 204 B 1.00
WriteStringsUtf8 Job-NTULTU PR True True NoneEscaped 6.026 ms 0.1207 ms 0.1291 ms 6.029 ms 5.797 ms 6.217 ms 0.99 Same 0.03 128 B 0.99
WriteStringsUtf8 Job-CQHRFM main True True NoneEscaped 6.065 ms 0.1315 ms 0.1462 ms 6.062 ms 5.722 ms 6.276 ms 1.00 Base 0.00 129 B 1.00
WriteStringsUtf16 Job-NTULTU PR True True NoneEscaped 7.621 ms 0.2102 ms 0.2421 ms 7.578 ms 7.255 ms 8.076 ms 0.79 Faster 0.10 130 B 0.80
WriteStringsUtf16 Job-CQHRFM main True True NoneEscaped 9.773 ms 1.2464 ms 1.3853 ms 9.446 ms 8.147 ms 13.602 ms 1.00 Base 0.00 162 B 1.00

@ghost
Copy link

ghost commented Oct 24, 2022

Tagging subscribers to this area: @dotnet/area-system-text-json, @gregsdennis
See info in area-owners.md if you want to be subscribed.

Issue Details

Second attempt at replacing JsonWriterHelper.Transcoding.cs

Fix #75779

Author: eiriktsarpalis
Assignees: eiriktsarpalis
Labels:

area-System.Text.Json

Milestone: -

@eiriktsarpalis eiriktsarpalis added this to the 8.0.0 milestone Oct 24, 2022
@eiriktsarpalis
Copy link
Member Author

cc @kasperk81

@eiriktsarpalis eiriktsarpalis added the breaking-change Issue or PR that represents a breaking API or functional change over a prerelease. label Oct 24, 2022
@ghost ghost added the needs-breaking-change-doc-created Breaking changes need an issue opened with https://github.com/dotnet/docs/issues/new?template=dotnet label Oct 24, 2022
@ghost
Copy link

ghost commented Oct 24, 2022

Added needs-breaking-change-doc-created label because this PR has the breaking-change label.

When you commit this breaking change:

  1. Create and link to this PR and the issue a matching issue in the dotnet/docs repo using the breaking change documentation template, then remove this needs-breaking-change-doc-created label.
  2. Ask a committer to mail the .NET Breaking Change Notification DL.

Tagging @dotnet/compat for awareness of the breaking change.

try
{
#if NETCOREAPP
s_utf8Encoding.GetCharCount(bytes);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@GrabYourPitchforks, I thought we had a Utf8.IsValid API, but I can't find it now in either source or an issue. Am I making it up? Is that something we plan to expose?

Copy link
Member

@am11 am11 Oct 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like roslyn is also using similar method for CS9026 diagnostics in ""u8 strings lowering: https://github.com/dotnet/roslyn/blob/ab7dd82/src/Compilers/CSharp/Portable/Lowering/LocalRewriter/LocalRewriter_Conversion.cs#L113

IsValid(ReadOnly<byte>) is a good candidate API for System.Text.Unicode.Utf8 type. :)

@eiriktsarpalis
Copy link
Member Author

I don't believe this warrants a breaking change document, the implementation been changed to reject invalid UTF-8 (invalid surrogate pairs) which were previously being accepted.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

JsonWriterHelper.Transcoding.cs replacement
3 participants