Skip to content
This repository was archived by the owner on Jan 23, 2023. It is now read-only.

Improve bool.TryFormat perf #18711

Merged
merged 2 commits into from
Jun 29, 2018
Merged

Improve bool.TryFormat perf #18711

merged 2 commits into from
Jun 29, 2018

Conversation

stephentoub
Copy link
Member

Improves throughput by ~50%, by avoiding AsSpan().CopyTo and just writing out the characters one-by-one. I experimented with playing the same tricks as Utf8Formatter around storing the data in a ulong and blitting it out to the destination span of chars reinterpreted, but it didn't make a measurable improvement and increased the code complexity.

I also tried porting the TryParse changes from Utf8Parser, but depending on the input to a benchmark, the new version(s) wasn't necessarily faster. For example, it helped a bit when the input was all caps, but it actually hurt when the input was all lowercase. The code was also significantly more complex, especially once endianness was factored in. In the end, I decided to leave it as is.

Contributes to https://github.com/dotnet/corefx/issues/30612
cc: @jkotas, @ahsonkhan

Improves throughput by ~50%, by avoiding AsSpan().CopyTo and just writing out the characters one-by-one.  I experimented with playing the same tricks around storing the data in a ulong and blitting it out to the destination reinterpreted as a span of ulongs, but it didn't make a measurable improvement and increased the code complexity.
@Tornhoof
Copy link

I looked at the disassembly, do I see that correctly, that the (uint) in
https://github.com/stephentoub/coreclr/blob/0f8fa5b7350d65a5052700a8e5c9dc5f9957da41/src/System.Private.CoreLib/shared/System/Boolean.cs#L102
makes sure that the array length checks are removed?
So it looks like:

00007ffc`8ed32ac7 83fa04          cmp     edx,4
00007ffc`8ed32aca 762f            jbe     00007ffc`8ed32afb
00007ffc`8ed32acc 6641c7014600    mov     word ptr [r9],46h
00007ffc`8ed32ad2 6641c741026100  mov     word ptr [r9+2],61h
00007ffc`8ed32ad9 6641c741046c00  mov     word ptr [r9+4],6Ch
00007ffc`8ed32ae0 6641c741067300  mov     word ptr [r9+6],73h
00007ffc`8ed32ae7 6641c741086500  mov     word ptr [r9+8],65h
00007ffc`8ed32aee 41c70005000000  mov     dword ptr [r8],5
00007ffc`8ed32af5 b801000000      mov     eax,1
00007ffc`8ed32afa c3              ret

instead of

00007ffc`8ed32adc 83fa04          cmp     edx,4
00007ffc`8ed32adf 7e4c            jle     00007ffc`8ed32b2d
00007ffc`8ed32ae1 83fa00          cmp     edx,0
00007ffc`8ed32ae4 7651            jbe     00007ffc`8ed32b37
00007ffc`8ed32ae6 6641c7014600    mov     word ptr [r9],46h
00007ffc`8ed32aec 83fa01          cmp     edx,1
00007ffc`8ed32aef 7646            jbe     00007ffc`8ed32b37
00007ffc`8ed32af1 6641c741026100  mov     word ptr [r9+2],61h
00007ffc`8ed32af8 83fa02          cmp     edx,2
00007ffc`8ed32afb 763a            jbe     00007ffc`8ed32b37
00007ffc`8ed32afd 6641c741046c00  mov     word ptr [r9+4],6Ch
00007ffc`8ed32b04 83fa03          cmp     edx,3
00007ffc`8ed32b07 762e            jbe     00007ffc`8ed32b37
00007ffc`8ed32b09 6641c741067300  mov     word ptr [r9+6],73h
00007ffc`8ed32b10 83fa04          cmp     edx,4
00007ffc`8ed32b13 7622            jbe     00007ffc`8ed32b37
00007ffc`8ed32b15 6641c741086500  mov     word ptr [r9+8],65h
00007ffc`8ed32b1c 41c70005000000  mov     dword ptr [r8],5
00007ffc`8ed32b23 b801000000      mov     eax,1

May I suggest to add either a comment to ensure that this is not removed or additionally to invert the array order, i.e. assign 'e','s','l', a','F' instead of 'F','a','l','s','e', as results in the same asm? (relying on the jitter to remove the array checks after accessing the last index sounds a bit more stable than relying on (uint) cast)

@stephentoub
Copy link
Member Author

stephentoub commented Jun 29, 2018

do I see that correctly

Yes. See https://github.com/dotnet/coreclr/issues/18688.

invert the array order, i.e. assign 'e','s','l', a','F'

That would still incur the bounds check on the first assignment.

add either a comment to ensure that this is not removed

The same cast happens in other places in coreclr as well, e.g. in string.IsNullOrEmpty:

public static bool IsNullOrEmpty(string value)
{
// Using 0u >= (uint)value.Length rather than
// value.Length == 0 as it will elide the bounds check to
// the first char: value[0] if that is performed following the test
// for the same test cost.
// Ternary operator returning true/false prevents redundant asm generation:
// https://github.com/dotnet/coreclr/issues/914
return (value == null || 0u >= (uint)value.Length) ? true : false;
}

@stephentoub stephentoub merged commit c209f0b into dotnet:master Jun 29, 2018
@stephentoub stephentoub deleted the boolperf branch June 29, 2018 17:31
@Tornhoof
Copy link

Thank you for your explanation.

@stephentoub
Copy link
Member Author

(Regardless, I added a comment. Thanks.)

dotnet-maestro-bot pushed a commit to dotnet-maestro-bot/corefx that referenced this pull request Jun 29, 2018
* Improve bool.TryFormat perf

Improves throughput by ~50%, by avoiding AsSpan().CopyTo and just writing out the characters one-by-one.  I experimented with playing the same tricks around storing the data in a ulong and blitting it out to the destination reinterpreted as a span of ulongs, but it didn't make a measurable improvement and increased the code complexity.

* Update Boolean.cs

Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>
dotnet-maestro-bot pushed a commit to dotnet-maestro-bot/corert that referenced this pull request Jun 29, 2018
* Improve bool.TryFormat perf

Improves throughput by ~50%, by avoiding AsSpan().CopyTo and just writing out the characters one-by-one.  I experimented with playing the same tricks around storing the data in a ulong and blitting it out to the destination reinterpreted as a span of ulongs, but it didn't make a measurable improvement and increased the code complexity.

* Update Boolean.cs

Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>
jkotas pushed a commit to dotnet/corefx that referenced this pull request Jun 30, 2018
* Improve bool.TryFormat perf

Improves throughput by ~50%, by avoiding AsSpan().CopyTo and just writing out the characters one-by-one.  I experimented with playing the same tricks around storing the data in a ulong and blitting it out to the destination reinterpreted as a span of ulongs, but it didn't make a measurable improvement and increased the code complexity.

* Update Boolean.cs

Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>
Anipik pushed a commit to dotnet/corert that referenced this pull request Jun 30, 2018
* Improve bool.TryFormat perf

Improves throughput by ~50%, by avoiding AsSpan().CopyTo and just writing out the characters one-by-one.  I experimented with playing the same tricks around storing the data in a ulong and blitting it out to the destination reinterpreted as a span of ulongs, but it didn't make a measurable improvement and increased the code complexity.

* Update Boolean.cs

Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>
picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022
* Improve bool.TryFormat perf

Improves throughput by ~50%, by avoiding AsSpan().CopyTo and just writing out the characters one-by-one.  I experimented with playing the same tricks around storing the data in a ulong and blitting it out to the destination reinterpreted as a span of ulongs, but it didn't make a measurable improvement and increased the code complexity.

* Update Boolean.cs


Commit migrated from dotnet/coreclr@c209f0b
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants