-
Notifications
You must be signed in to change notification settings - Fork 5.1k
Vectorize codegen for left shift operator on byte vector #108336
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vectorize codegen for left shift operator on byte vector #108336
Conversation
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
…vington/runtime into vector128-byte-left-shift
{ | ||
// We don't have actual instructions for shifting bytes, so we'll emulate them | ||
// by shifting 32-bit values and masking off the bits that should be zeroed. | ||
|
||
assert(varTypeIsByte(simdBaseType)); | ||
|
||
intrinsic = | ||
GenTreeHWIntrinsic::GetHWIntrinsicIdForBinOp(this, op, op1, op2ForLookup, TYP_INT, simdSize, false); | ||
GenTreeHWIntrinsic::GetHWIntrinsicIdForBinOp(this, op, op1, op2ForLookup, | ||
op == GT_RSZ ? TYP_INT : TYP_SHORT, simdSize, false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need to be TYP_SHORT
for left shift?
pslld
is also available for SSE2 and should achieve the same thing, correct? We typically prefer int
where it is possible since instructions operating on 32-bit elements
tend to have better overall throughput and optimization opportunities.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're also right, it doesn't need to be TYP_SHORT
. I hadn't originally considered that it could be TYP_INT
as well.
I updated the type here to be TYP_INT
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just one question about whether we actually need to use TYP_SHORT.
Thanks for getting this up and completed!
CC. @dotnet/jit-contrib for secondary review. Simple PR from our friends at AMD |
/azp run Fuzzlyn |
Azure Pipelines successfully started running 1 pipeline(s). |
Fuzzlyn failures are unrelated. |
PFX failure is also unrelated, was fixed by f059869 |
Thanks for the contribution @alexcovington |
The left shift operator for
Vector128<byte>/Vector256<byte>/Vector512<byte>
does not vectorize. This PR vectorizes the left shift operator for these types on x64.For example, this C# code will now vectorize:
Codegen
Before
After
I see performance improvements in some of the
System.Numerics.Tensors
microbenchmarks on my x64 machine:Baseline
Diff