-
Notifications
You must be signed in to change notification settings - Fork 5.1k
Improve JSON validation perf #111332
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve JSON validation perf #111332
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot reviewed 5 out of 5 changed files in this pull request and generated no comments.
Comments suppressed due to low confidence (1)
src/libraries/System.Text.Json/src/System/Text/Json/Writer/Utf8JsonWriter.WriteValues.Helpers.cs:71
- The validation check for
_tokenType == StringSegmentSentinel
should be consistent with other checks. Ensure that the logic is clear and maintainable.
private void OnValidateWritingValueFailed()
Nice! Would the hypothetical inclusion of new token types in the future render this optimization obsolete? Would it be possible to simplify this technique if we moved |
Yep, that is much simpler, I updated PR description with the changes and the perf (which didn't change much).
The new version still depends on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot reviewed 5 out of 6 changed files in this pull request and generated no comments.
Files not reviewed (1)
- src/libraries/System.Text.Json/src/System/Text/Json/BitStack.cs: Evaluated as low risk
Comments suppressed due to low confidence (1)
src/libraries/System.Text.Json/src/System/Text/Json/Writer/Utf8JsonWriter.WriteValues.Helpers.cs:51
- The condition should be checked for null or invalid state before proceeding.
if (_enclosingContainer == EnclosingContainerType.PartialValue)
I would recommend merging #111041 before continuing work on this PR. Functional completeness first, then we can make it faster. |
…lidation-perfopt
…EnclosingContainerType
} | ||
return inObject; | ||
} | ||
|
||
[MethodImpl(MethodImplOptions.NoInlining)] | ||
private bool PopFromArray() | ||
private readonly bool PeekInArray() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are we peeking at in this case? It seems we're just returning a boolean so perhaps this could be expressed as a property, i.e. IsInArray
or something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks at the top of the stack assuming the stack is backed by an array. IsInArray
sounds like it's checking containment. I also like it being a method since it's symmetric with Push/PushToArray and Peek/PeekInArray. I'm open to better naming but I left it as is with more comments.
src/libraries/System.Text.Json/src/System/Text/Json/BitStack.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Text.Json/src/System/Text/Json/ThrowHelper.cs
Outdated
Show resolved
Hide resolved
...libraries/System.Text.Json/src/System/Text/Json/Writer/Utf8JsonWriter.WriteValues.Helpers.cs
Outdated
Show resolved
Hide resolved
...libraries/System.Text.Json/src/System/Text/Json/Writer/Utf8JsonWriter.WriteValues.Helpers.cs
Outdated
Show resolved
Hide resolved
...libraries/System.Text.Json/src/System/Text/Json/Writer/Utf8JsonWriter.WriteValues.Helpers.cs
Show resolved
Hide resolved
src/libraries/System.Text.Json/src/System/Text/Json/Writer/Utf8JsonWriter.cs
Show resolved
Hide resolved
src/libraries/System.Text.Json/src/System/Text/Json/Writer/Utf8JsonWriter.cs
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great. Do you have updated benchmark results to share after including the most recent changes?
I would also recommend comparing the full benchmark suite from dotnet/performance
in case a particular workflow has slipped.
Thanks!
This comment was marked as resolved.
This comment was marked as resolved.
@EgorBot -amd -arm -profiler --envvars DOTNET_JitDisasm:System.Text.Json.JsonDocument:WriteString(byref,System.Text.Json.Utf8JsonWriter) using BenchmarkDotNet.Attributes;
using System.Buffers;
using System.Text.Json;
using System.Text;
using BenchmarkDotNet.Running;
BenchmarkSwitcher.FromAssembly(typeof(Program).Assembly).Run(args);
public class Perf_ParseThenWrite
{
const string jsonString =
"""
{
"tags": [
"culpa",
"laboris",
"nulla",
"exercitation",
"do",
"pariatur",
"occaecat",
"aliquip",
"est",
"et",
"officia",
"minim",
"quis",
"est",
"ullamco",
"consequat",
"id",
"esse",
"irure",
"reprehenderit",
"proident",
"labore",
"et",
"in",
"consequat",
"officia",
"exercitation",
"aute",
"do",
"consequat",
"laborum",
"officia",
"consectetur",
"deserunt",
"ex",
"do",
"nostrud",
"deserunt",
"consectetur",
"adipisicing",
"et",
"non",
"veniam",
"dolor",
"Lorem",
"sunt",
"amet",
"cillum",
"excepteur",
"mollit"
]
}
""";
private byte[] _dataUtf8;
private Utf8JsonWriter _writer;
public bool IsDataIndented = true;
[GlobalSetup]
public void Setup()
{
_dataUtf8 = Encoding.UTF8.GetBytes(jsonString);
var abw = new ArrayBufferWriter<byte>();
_writer = new Utf8JsonWriter(abw, new JsonWriterOptions { Indented = IsDataIndented });
}
[GlobalCleanup]
public void CleanUp()
{
_writer.Dispose();
}
[Benchmark]
public void ParseThenWrite()
{
_writer.Reset();
using (JsonDocument document = JsonDocument.Parse(_dataUtf8))
{
document.WriteTo(_writer);
}
}
} |
Updated results for latest commit. Looks basically the same as before.
Only |
src/libraries/System.Text.Json/src/System/Text/Json/Writer/Utf8JsonWriter.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Text.Json/src/System/Text/Json/Writer/Utf8JsonWriter.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Text.Json/src/System/Text/Json/Writer/Utf8JsonWriter.cs
Outdated
Show resolved
Hide resolved
/// so <see cref="_bitStack"/> does not need to store its state. | ||
/// <see cref="IsWritingPartialString"/> relies on the value of the partial string members being the largest values of this enum. | ||
/// </summary> | ||
Base64StringSequence = 0b011_0_0000, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remind me, why do the bits of this value overlap with the previous two container types? Couldn't it just be an independent value like 1 << 7
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That made it seem like no more values could be added to the enum, but actually an equivalently simple encoding that would still work is 1<<4, 2<<4, 3<<4 and 4<<4 for these last 4 members.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Presumably you mean 1 <<4
, 1 << 5
, 1 << 6
, etc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I meant they don't need to be flags-style as long as they're >= 16 (avoiding bit overlap with json token type).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to hex for readability and renumbered to 0x10, 0x20, 0x30 and 0x40.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
* main: (89 commits) Add Dispose for X509Chain instance (dotnet#110740) Fix XML comment on regex split enumerator (dotnet#111572) JIT: tolerate missing InitClass map in SPMI (dotnet#111555) Build ilasm/ildasm packages for the host machine (dotnet#111512) Unicode 16.0 Support (dotnet#111469) Improve performance of interface method resolution in ILC (dotnet#103066) Fix building the host-targeting components and packing ILC (dotnet#111552) Improve JSON validation perf (dotnet#111332) Update github-merge-flow.jsonc to autoflow 9.0 to 9.0-staging (dotnet#111549) Include GPL-3 licence text in the notice (dotnet#111528) Remove explicit __compact_unwind entries from x64 assembler (dotnet#111530) Add MemoryExtensions overloads with comparer (dotnet#110197) Avoid capturing the ExecutionContext for the whole HTTP connection lifetime (dotnet#111475) Forward DefaultArtifactVisibility down from the VMR orchestrator (dotnet#111513) Fix relocs errors on riscv64 (dotnet#111317) Added JITDUMP_USE_ARCH_TIMESTAMP support. (dotnet#111359) add rcl/rcr tp and latency info (dotnet#111442) Fix stack overflow in compiler-generated state (dotnet#109207) Produce a package with the host-running ILC for repos in the VMR (dotnet#111443) Delete dead code in ilasm PE writer (dotnet#111218) ...
} | ||
} | ||
|
||
private void ValidateNotWithinUnfinalizedString() | ||
[DoesNotReturn] | ||
[MethodImpl(MethodImplOptions.NoInlining)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NoInlining
should not be applied to throw helpers, otherwise RyuJIt will not detect the callsite as cold code.
These helpers should probably follow the pattern like throw GetInvalidOperationException();
to make them small, but with no return (thus not inlined), so the callsites can be optimized properly.
Fixes #111039
With the addition of string value segment APIs to the JSON writer (#101356), validation in UTF8JsonWriter needs to also validate that values are not written after uncompleted string segments. This extra check for every write impacts perf, especially when writing values in tight loops. The linked bug above shows some such scenarios (e.g. serializing collections). This PR improves performance for validation by (1) refactoring exception throwing call sites and (2) changing the validation check when writing JSON values.
Perf
The commits measured are main (bc2ac2f), pre-StringValueSegment (before the regression was introduced, e5878e9), and this PR. All 4 test cases from #111039 are shown here.
WriteJson<HashSet>.SerializeToWriter
WriteJson<ImmutableDictionary<String, String>>.SerializeToWriter
WriteJson<LoginViewModel>.SerializeToWriter
Perf_Basic.WriteBasicUtf8
Changes in this PR
Exceptions
Exceptions are now using the throw helper pattern with minimal arguments. This reduces the code size in the validation methods and defers the performance penalty to only the exceptional scenarios. With this change the JIT also seems more willing to inline business logic which gives perf improvements in tight loop source generated methods like this:
Validation for values
The validation check for writing values is now a single check instead of multiple checks based on enclosing container type. For this,
_isObject
is now changed to_enclosingContainer
, which can be either object, array, root/none, or partial value. This removes the need to checkCurrentDepth
separately to determine whether the writer is at the root level. In addition, the actual enum values are:These were chosen in such a way that the value validation can be reduced to:
Writing values in arrays is always valid and writing values in partial values is never valid. Otherwise, in objects and at the root level, the only valid previous tokens are property and none respectively. This can be checked with equality since we picked the enum values for
EnclosingContainerType
for this purpose.