JIT ARM64: Don't emit mov for zero case in jump tables for shift intrinsics#107322
Conversation
mov for zero case in jump tables for shift intrinsicsmov for zero case in jump tables for left-shift intrinsics
|
I initially neglected that we can do something like cc @dotnet/jit-contrib, @tannergooding PTAL. Thanks! |
|
ping @tannergooding @dotnet/jit-contrib |
| if ((shiftAmount == 0) && (intrin.category == HW_Category_ShiftRightByImmediate)) | ||
| { | ||
| // TODO: Use emitIns_Mov instead. | ||
| // We do not use it currently because it will still elide the 'mov' | ||
| // even if 'canSkip' is false. We cannot elide the 'mov' here. | ||
| GetEmitter()->emitIns_R_R_R(INS_mov, emitTypeSize(node), targetReg, reg, reg); | ||
| // For operations like `Vector128<byte> >> 8`, over-shifting behavior will mask | ||
| // the immediate so that it's zero, and the shift will be imported as a GT_HWINTRINSIC. | ||
| // If this isn't const-folded away, it is possible to try to emit a right-shift by zero, | ||
| // but only if the immediate is known (we won't emit a zero case for the jump table). | ||
| // Since we cannot encode a right-shift by 0 on ARM64, emit a mov instead. | ||
| assert(op->isContainedIntOrIImmed()); | ||
| GetEmitter()->emitIns_Mov(INS_mov, emitTypeSize(node), targetReg, reg, /* canSkip */ true); | ||
| } |
There was a problem hiding this comment.
Why wouldn't we emit a case for the jump table?
If the value is unknown and it doesn't throw, then won't we hit the case in the jump table and need handling emitted for it?
There was a problem hiding this comment.
You're correct that for unknown valid immediates, we will emit the jump table. In the table generation logic, we get the table's bounds from HWIntrinsicInfo::lookupImmBounds, which sets the lower bound at 1 for right-shift intrinsics here. So if shiftAmount == 0 for a right-shift intrinsic, we should not be generating a jump table.
There was a problem hiding this comment.
So what are we doing if the value is unknown and 0? Because we're not throwing for the constant case here.
There was a problem hiding this comment.
Good question. I tried the following program:
public class Test
{
public byte m;
}
public class Program
{
static Test t = new Test();
public static void Main()
{
var v = Vector128<Byte>.Zero;
var result = v >> t.m;
Console.WriteLine(result);
}
}The JIT imports the shift as ShiftLogical, hence why we don't throw. I can update the comment to cover this case, if you'd like.
There was a problem hiding this comment.
Hmmm. I wonder if this is the "right" fix.
We basically have:
- Import the intrinsic
- If the shift amount is non-constant, import as
ShiftLogical - If the shift amount is constant, then mask, which could become zero
So I think we're maybe missing a check that ensures that if its zero, its still treated as "out of range" and gets imported as ShiftLogical instead
There was a problem hiding this comment.
I think your proposed fix is better. I initially wanted the fallback implementations we've been adding to kick in for cases like this, but the logic for checking immediates and adding range checks/etc. in Compiler::impHWIntrinsic runs before we check if the intrinsic is table-driven or needs special handling. I guess in GenTreeHWIntrinsic::GetHWIntrinsicIdForBinOp, instead of just checking if the immediate is const, we need to also check if it's out-of-range and use ShiftLogical accordingly -- I'll give that a try.
mov for zero case in jump tables for left-shift intrinsicsmov for zero case in jump tables for shift intrinsics
|
Diffs are from the I'm guessing these were previously getting folded away. |
src/coreclr/jit/gentree.cpp
Outdated
| // and return the correct fallback intrinsic. | ||
| if ((op != GT_LSH) && (op2->AsIntCon()->IconValue() == 0)) | ||
| { | ||
| op2 = gtNewSimdCreateBroadcastNode(type, op2, simdBaseJitType, simdSize); |
There was a problem hiding this comment.
nit, we prefer the direct zero constant helper in cases like this:
| op2 = gtNewSimdCreateBroadcastNode(type, op2, simdBaseJitType, simdSize); | |
| op2 = gtNewZeroConNode(type); |
tannergooding
left a comment
There was a problem hiding this comment.
I'm guessing these were previously getting folded away
We can get this handled in .NET 10 and ensure that ShiftLogical(x, zero) is elided. It shouldn't be important to preserve for .NET 9
|
/backport to release/9.0 |
|
Started backporting to release/9.0: https://github.com/dotnet/runtime/actions/runs/10745195972 |
Fixes #107173. ARM64 right-shift intrinsics don't support encoding shift amounts of zero -- I believe this was the initial motivation for emitting a
movin lieu of the intrinsic when the shift amount is zero. However,HWIntrinsicImmOpHelpershould never try to emit a shift amount of zero for right-shift intrinsics, so we shouldn't need themovcase. Furthermore, emitting amovinstead of the intrinsic drops any side effects other than the shift, as demonstrated by the usage ofShiftLeftLogicalWideningLowerin the test without this fix.By removing the
movcase, left-shift intrinsics should work correctly with all immediates now, and for cases where we try to right-shift by zero, we either switch over to a fallback implementation during importation if one is available, or we throw anArgumentOutOfRangeExceptionat runtime.