-
Couldn't load subscription status.
- Fork 5.2k
Optimize System.HexConverter.IsHexChar on 64 bits #52470
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Add a branchless fast path on 64 bit systems that doesn't do memory accesses either
|
I couldn't figure out the best area label to add to this PR. If you have write-permissions please help me learn by adding exactly one area label. |
| public static bool IsHexChar(int c) | ||
| public static unsafe bool IsHexChar(int c) | ||
| { | ||
| if (sizeof(IntPtr) == 8) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since corelib is always arch specific I guess it's better (e.g. for JIT) to use #if TARGET_64BIT here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is in the Common folder and is pulled in by a bunch of different libs, some of which only build AnyCPU. If we wanted an ifdef for the target platform, we'd also need to ifdef CORELIB. (See ValueStringBuilder.cs for some examples.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@GrabYourPitchforks ah, good point!
| shift = unchecked((int)((35465847073801215UL >> i) & 1)), | ||
| and = shift & mask; | ||
| byte result = unchecked((byte)and); | ||
| bool valid = *(bool*)&result; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We considered similar tricks to this a while back and those tricks were rejected. See dotnet/coreclr#16138 for one example.
The tl;dr was "somebody's probably branching on the result of this method, so it's better if the final statement is an actual comparison that can be folded into the caller's branch condition."
The implication would be that these lines become return (and & 1) != 0;, or return (and & 1) != 0 ? true : false; if we're trying to work around #4207.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense, I was thinking the last return expression could be changed to that due to callers branching 🙂
Question though: given the input is guaranteed to be either 1 or 0 here due to the previous logic, we can just test and != 0 here, right? As in, we can skip that & 1 since that wouldn't affect the actual semantics in this case, no?
So at the end of the day:
return and != 0 ? true : false;Should work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, that should work!
75adce8 to
db8484c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: should read ['0', '0' + 64).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused, aren't we checking that c is also >= 0 here?
From your original snippet on Discord:
ulong mask = i - 64; // c is negative IFF '0' <= c < '0' + 64; else c is non-negativeAm I mixing things up here? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If c is within [0, '0'), bit 31 will be set, but bit 63 (the bit we care about for masking purposes) will not be set.
Bit 63 only gets set if c is within ['0', '0' + 64).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh right, yeah. Fixed, thanks! 😄
db8484c to
8690f42
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove unsafe keyword; no longer needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comment below, done in 78bc559532402d2d4dd1a2e9a85b9e9efd4240a5.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prefer IntPtr.Size rather than sizeof(IntPtr) here. illink can see this as a proper method call to IntPtr.get_Size and will perform substitution. See, e.g., https://github.com/dotnet/runtime/blob/74cafe71de68fac46cd4956da98387bce9e1043c/src/libraries/System.Private.CoreLib/src/ILLink/ILLink.Substitutions.64bit.xml.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, TIL. Done in 78bc559532402d2d4dd1a2e9a85b9e9efd4240a5 🙂
78bc559 to
cad3a08
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Appreciate the perf numbers in the issue as well.
|
Tagging subscribers to this area: @tannergooding Issue DetailsOverviewThis PR adds a fast path to
The change is a specialized version of what I used in Codegen diffBefore (click to expand):; Method IsHexCharFast.HexConverter2:IsHexChar_OG(int):bool
G_M3768_IG01:
sub rsp, 40
;; bbWeight=1 PerfScore 0.25
G_M3768_IG02:
cmp ecx, 256
jge SHORT G_M3768_IG04
;; bbWeight=1 PerfScore 1.25
G_M3768_IG03:
cmp ecx, 256
jae SHORT G_M3768_IG07
movsxd rax, ecx
mov rdx, 0xD1FFAB1E
movzx rax, byte ptr [rax+rdx]
jmp SHORT G_M3768_IG05
;; bbWeight=0.50 PerfScore 2.88
G_M3768_IG04:
mov eax, 255
;; bbWeight=0.50 PerfScore 0.12
G_M3768_IG05:
cmp eax, 255
setne al
movzx rax, al
;; bbWeight=1 PerfScore 1.50
G_M3768_IG06:
add rsp, 40
ret
;; bbWeight=1 PerfScore 1.25
G_M3768_IG07:
call CORINFO_HELP_RNGCHKFAIL
int3
;; bbWeight=0 PerfScore 0.00
; Total bytes of code: 66After (click to expand):; Method IsHexCharFast.HexConverter2:IsHexChar(int):bool
G_M2063_IG01:
;; bbWeight=1 PerfScore 0.00
G_M2063_IG02:
add ecx, -48
lea rax, [rcx-64]
mov rdx, 0xD1FFAB1E
shl rdx, cl
and rax, rdx
jl SHORT G_M2063_IG05
;; bbWeight=1 PerfScore 4.25
G_M2063_IG03:
xor eax, eax
;; bbWeight=0.50 PerfScore 0.12
G_M2063_IG04:
ret
;; bbWeight=0.50 PerfScore 0.50
G_M2063_IG05:
mov eax, 1
;; bbWeight=0.50 PerfScore 0.12
G_M2063_IG06:
ret
;; bbWeight=0.50 PerfScore 0.50
; Total bytes of code: 34Additionally, the new version can also be JITted to just a constant, if the input is a constant. Before (click to expand):; Method IsHexCharFast.HexConverter2:Check_Constant_OG():bool
G_M36347_IG01:
;; bbWeight=1 PerfScore 0.00
G_M36347_IG02:
mov rax, 0xD1FFAB1E
movzx rax, byte ptr [rax]
cmp eax, 255
setne al
movzx rax, al
;; bbWeight=1 PerfScore 3.75
G_M36347_IG03:
ret
;; bbWeight=1 PerfScore 1.00
; Total bytes of code: 25After (click to expand):; Method IsHexCharFast.HexConverter2:Check_Constant_NEW():bool
G_M50831_IG01:
;; bbWeight=1 PerfScore 0.00
G_M50831_IG02:
mov eax, 1
;; bbWeight=1 PerfScore 0.25
G_M50831_IG03:
ret
;; bbWeight=1 PerfScore 1.00
; Total bytes of code: 6BenchmarkI've put together a small test benchmark which you can find here.
The new version is about 2x faster over random data in this test. When the input is always valid and the branch predictor can be more effective in the original implementation, this test shows the new version is still on par, but a couple notes:
JIT diffCurrently work in progress, I'm not having luck with the runtime tooling today... 😄
|
cad3a08 to
db25ea4
Compare
|
@EgorBo @tannergooding any other thoughts on this? Thinking of merging it in Tuesday, which gives a little while longer for comments. |
|
Nice contribution @Sergio0694 thank you. |
Overview
This PR adds a fast path to
System.HexConverter.IsHexChar(int)on 64 bit systems, which has:The change is a specialized version of what I used in
BitHelper.HasLookupFlagin theMicrosoft.Toolkit.HighPerformancepackage, and just uses bit trickery to make the code branchless and read the lookup value from a constant value and not from memory.Codegen diff
Before (click to expand):
After (click to expand):
Additionally, the new version can also be JITted to just a constant, if the input is a constant.
That is, if you call
IsHexCharwith an input constant like'A', you get this JIT diff:Before (click to expand):
After (click to expand):
Benchmark
I've put together a small test benchmark which you can find here.
The new version is about 2x faster over random data in this test. When the input is always valid and the branch predictor can be more effective in the original implementation, this test shows the new version is still on par, but a couple notes:
JIT diff
Currently work in progress, I'm not having luck with the runtime tooling today... 😄
Opened the PR in the meantime to have the CI run on it at least.