-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Conversation
Improves throughput by ~50%, by avoiding AsSpan().CopyTo and just writing out the characters one-by-one. I experimented with playing the same tricks around storing the data in a ulong and blitting it out to the destination reinterpreted as a span of ulongs, but it didn't make a measurable improvement and increased the code complexity.
I looked at the disassembly, do I see that correctly, that the (uint) in 00007ffc`8ed32ac7 83fa04 cmp edx,4
00007ffc`8ed32aca 762f jbe 00007ffc`8ed32afb
00007ffc`8ed32acc 6641c7014600 mov word ptr [r9],46h
00007ffc`8ed32ad2 6641c741026100 mov word ptr [r9+2],61h
00007ffc`8ed32ad9 6641c741046c00 mov word ptr [r9+4],6Ch
00007ffc`8ed32ae0 6641c741067300 mov word ptr [r9+6],73h
00007ffc`8ed32ae7 6641c741086500 mov word ptr [r9+8],65h
00007ffc`8ed32aee 41c70005000000 mov dword ptr [r8],5
00007ffc`8ed32af5 b801000000 mov eax,1
00007ffc`8ed32afa c3 ret instead of 00007ffc`8ed32adc 83fa04 cmp edx,4
00007ffc`8ed32adf 7e4c jle 00007ffc`8ed32b2d
00007ffc`8ed32ae1 83fa00 cmp edx,0
00007ffc`8ed32ae4 7651 jbe 00007ffc`8ed32b37
00007ffc`8ed32ae6 6641c7014600 mov word ptr [r9],46h
00007ffc`8ed32aec 83fa01 cmp edx,1
00007ffc`8ed32aef 7646 jbe 00007ffc`8ed32b37
00007ffc`8ed32af1 6641c741026100 mov word ptr [r9+2],61h
00007ffc`8ed32af8 83fa02 cmp edx,2
00007ffc`8ed32afb 763a jbe 00007ffc`8ed32b37
00007ffc`8ed32afd 6641c741046c00 mov word ptr [r9+4],6Ch
00007ffc`8ed32b04 83fa03 cmp edx,3
00007ffc`8ed32b07 762e jbe 00007ffc`8ed32b37
00007ffc`8ed32b09 6641c741067300 mov word ptr [r9+6],73h
00007ffc`8ed32b10 83fa04 cmp edx,4
00007ffc`8ed32b13 7622 jbe 00007ffc`8ed32b37
00007ffc`8ed32b15 6641c741086500 mov word ptr [r9+8],65h
00007ffc`8ed32b1c 41c70005000000 mov dword ptr [r8],5
00007ffc`8ed32b23 b801000000 mov eax,1 May I suggest to add either a comment to ensure that this is not removed or additionally to invert the array order, i.e. assign 'e','s','l', a','F' instead of 'F','a','l','s','e', as results in the same asm? (relying on the jitter to remove the array checks after accessing the last index sounds a bit more stable than relying on (uint) cast) |
Yes. See https://github.com/dotnet/coreclr/issues/18688.
That would still incur the bounds check on the first assignment.
The same cast happens in other places in coreclr as well, e.g. in string.IsNullOrEmpty: coreclr/src/System.Private.CoreLib/shared/System/String.cs Lines 439 to 448 in 13e7606
|
Thank you for your explanation. |
(Regardless, I added a comment. Thanks.) |
* Improve bool.TryFormat perf Improves throughput by ~50%, by avoiding AsSpan().CopyTo and just writing out the characters one-by-one. I experimented with playing the same tricks around storing the data in a ulong and blitting it out to the destination reinterpreted as a span of ulongs, but it didn't make a measurable improvement and increased the code complexity. * Update Boolean.cs Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>
* Improve bool.TryFormat perf Improves throughput by ~50%, by avoiding AsSpan().CopyTo and just writing out the characters one-by-one. I experimented with playing the same tricks around storing the data in a ulong and blitting it out to the destination reinterpreted as a span of ulongs, but it didn't make a measurable improvement and increased the code complexity. * Update Boolean.cs Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>
* Improve bool.TryFormat perf Improves throughput by ~50%, by avoiding AsSpan().CopyTo and just writing out the characters one-by-one. I experimented with playing the same tricks around storing the data in a ulong and blitting it out to the destination reinterpreted as a span of ulongs, but it didn't make a measurable improvement and increased the code complexity. * Update Boolean.cs Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>
* Improve bool.TryFormat perf Improves throughput by ~50%, by avoiding AsSpan().CopyTo and just writing out the characters one-by-one. I experimented with playing the same tricks around storing the data in a ulong and blitting it out to the destination reinterpreted as a span of ulongs, but it didn't make a measurable improvement and increased the code complexity. * Update Boolean.cs Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>
* Improve bool.TryFormat perf Improves throughput by ~50%, by avoiding AsSpan().CopyTo and just writing out the characters one-by-one. I experimented with playing the same tricks around storing the data in a ulong and blitting it out to the destination reinterpreted as a span of ulongs, but it didn't make a measurable improvement and increased the code complexity. * Update Boolean.cs Commit migrated from dotnet/coreclr@c209f0b
Improves throughput by ~50%, by avoiding AsSpan().CopyTo and just writing out the characters one-by-one. I experimented with playing the same tricks as Utf8Formatter around storing the data in a ulong and blitting it out to the destination span of chars reinterpreted, but it didn't make a measurable improvement and increased the code complexity.
I also tried porting the TryParse changes from Utf8Parser, but depending on the input to a benchmark, the new version(s) wasn't necessarily faster. For example, it helped a bit when the input was all caps, but it actually hurt when the input was all lowercase. The code was also significantly more complex, especially once endianness was factored in. In the end, I decided to leave it as is.
Contributes to https://github.com/dotnet/corefx/issues/30612
cc: @jkotas, @ahsonkhan