-
Notifications
You must be signed in to change notification settings - Fork 5.2k
Reduce branches in padding adjustment in FromBase64_ComputeResultLength #115022
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
ReadOnlySpan<byte> map = [0, 2, 1]; | ||
padding = map[padding]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This trades branches vs. memory access -- I don't know which approach will be faster and in benchmarks it will be hard to measure, because branch predictor will do a good job, and for the memory access the data will be in the cpu caches, so fast retrieval.
Or put differently: it trades possible branch mispredicts vs. potential cache eviction.
Maybe it's better to write the code as idiomatic as possible and let the compilers optimize it out. Actually with a switch
it looks quite good and produces code similar to what native compilers produce.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, writing the code in a more idiomatic style is preferable. However, at the moment, the JIT doesn't seem to generate a value-to-result mapping (like C/C++ does) for simple cases like this. Even for a switch, it just falls back to a jump table, which still involves a memory access.
I agree - it's better to either leave it as-is or rewrite it using a switch, and help the JIT recognize and optimize such patterns. I think I even saw an issue about this a while ago, but couldn't find it right away.
Just for reference, here's what C/C++ produces:
mov edi, edi
mov eax, DWORD PTR CSWTCH.1[0+rdi*4]
CSWTCH.1:
.long 0
.long 2
.long 1
As we can see the C/C++ compiler prefers a memory access here, rather than relying on the branch predictor.
And here's what the JIT produces in my case - basically the same thing as the C++ version:
mov rdi, 0x72BEDC82CB80 ; static handle
movzx rax, byte ptr [rax+rdi]
And here's what a typical switch ends up looking like:
lea rdi, [reloc @RWD00]
mov edi, dword ptr [rdi+4*rax]
lea rcx, G_M26941_IG02
add rdi, rcx
jmp rdi
mov eax, 1
jmp SHORT G_M26941_IG07
mov eax, 2
jmp SHORT G_M26941_IG07
xor eax, eax
G_M26941_IG07: ;; offset=0x0035
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’ve found the issue related to this: #114041
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I think it would be still good to change to the switch
-approach here.
Once the JIT got improved, then the benefit here comes for free.
Tagging subscribers to this area: @dotnet/area-system-runtime |
1 => 2, | ||
2 => 1, | ||
_ => throw new FormatException(SR.Format_BadBase64Char) | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The title of the PR is "reduce branches", but this doesn't actually reduce branches today, does it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about this: "Improve readability of Base64 padding adjustment logic"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, the change seems too minor to have a meaningful impact. It might be more worthwhile to focus on removing "unsafe" in FromBase64_ComputeResultLength
and related methods. Here's the commit for that: rameel@fe9f28b
Unfortunately due to the current JIT's inability to eliminate unnecessary bounds checks in cases like this (see #115091):
while (chars.Length != 0)
{
if (chars[^1] is not (' ' or '\n' or '\r' or '\t'))
break;
chars = chars.Slice(0, chars.Length - 1); // Unnecessary bounds checks
}
and a performance regression compared to NET9 (#115090) I decided to postpone creating PR for this until later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, anyway
No description provided.