-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Update the table-driven framework to support x86 imm-intrinsics #16183
Conversation
I will submit test cases later. |
src/jit/emitxarch.cpp
Outdated
@@ -170,6 +170,14 @@ bool emitter::IsDstDstSrcAVXInstruction(instruction ins) | |||
case INS_psubusb: | |||
case INS_psubusw: | |||
case INS_psubw: | |||
case INS_psllw: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: it would help if we would keep alphabetical order of instructions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, will fix.
src/jit/emitxarch.cpp
Outdated
@@ -5324,6 +5334,41 @@ void emitter::emitIns_SIMD_R_R_R(instruction ins, emitAttr attr, regNumber reg, | |||
} | |||
} | |||
|
|||
void emitter::emitIns_SIMD_R_R_I(instruction ins, emitAttr attr, regNumber reg, regNumber reg1, int ival) | |||
{ | |||
bool isSSEShift = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this should be in a separate "isSseShift" function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I will refactor here.
src/jit/hwintrinsiccodegenxarch.cpp
Outdated
{ | ||
if (op3->IsCnsIntOrI()) | ||
{ | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extra empty line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
src/jit/hwintrinsiccodegenxarch.cpp
Outdated
{ | ||
if (op2->IsCnsIntOrI()) | ||
{ | ||
ssize_t ival = op2->AsIntConCommon()->IconValue(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you have used IsCnsIntOrI
using AsIntCon
would make more sense. AsIntConCommon
also accepts GT_CNS_LNG
which is not needed here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, thank you!
src/jit/hwintrinsiccodegenxarch.cpp
Outdated
assert(op2Reg != REG_NA); | ||
NamedIntrinsic intrinsicID = node->gtHWIntrinsicId; | ||
var_types baseType = node->gtSIMDBaseType; | ||
emitAttr attr = (emitAttr)node->gtSIMDSize; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it makes a significant difference but in many places the EA_ATTR
macro is used for this.
src/jit/hwintrinsiccodegenxarch.cpp
Outdated
|
||
// Emit the jump table | ||
|
||
JITDUMP("\n J_M%03u_DS%02u LABEL DWORD\n", Compiler::s_compMethodsCount, jmpTableBase); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is really useful. It looks like it was copied from genJumpTable
but then it didn't belong there either. This kind of dump messages, if really needed, should be in emitBBTableDataGenBeg
and emitDataGenData
.
Or better, the data section should be included in the disassembly produced by the JIT. But that's a different story.
src/jit/hwintrinsicxarch.cpp
Outdated
// Return Value: | ||
// the max imm-value of non-full-range IMM intrinsic | ||
// | ||
int Compiler::immUpperBoundOfHWIntrinsic(NamedIntrinsic intrinsic, var_types baseType) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just don't understand why people insist in adding more stuff to Compiler
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm.. these functions are needed by multiple passes (e.g., importer, codegen, lowering, etc). Is there other better approach to organize these functions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One reasonable option seems to be for these functions to be static members of HWIntrinsicInfo
. Haven't really checked if this is actually possible or not. Another option might be global functions, similar to varType
functions for example. Yet another option would be a HWIntrinsicTable
class that basically acts as an abstraction for hwIntrinsicInfoArray
.
That said, I'm not sure if it's worth bothering with this ATM.
src/jit/hwintrinsicxarch.cpp
Outdated
case NI_AVX2_ShiftLeftLogical: | ||
case NI_AVX2_ShiftRightArithmetic: | ||
case NI_AVX2_ShiftRightLogical: | ||
return genTypeSize(baseType) * 8 - 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something seems fishy here - the proper upper bound of shift is likely 255, not a number depending on the element type size.
src/jit/hwintrinsicxarch.cpp
Outdated
GenTree* Compiler::impOutOfRangeFallback(NamedIntrinsic intrinsic, var_types simdType, var_types baseType) | ||
{ | ||
assert((flagsOfHWIntrinsic(intrinsic) & HW_Flag_OutOfRangeFallbackIMM) != 0); | ||
assert(simdType == TYP_SIMD32 || simdType == TYP_SIMD16); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that the formatter doesn't flag these but the usual practice is to parenthesize this kind of expressions: assert((simdType == TYP_SIMD32) || (simdType == TYP_SIMD16));
. Happens in multiple places.
src/jit/hwintrinsicxarch.cpp
Outdated
// - AND, the imm-intrinsic does not have out-of-range fallback | ||
|
||
if (!mustExpand && lastOp->IsCnsIntOrI() && | ||
lastOp->AsIntConCommon()->IconValue() > immUpperBoundOfHWIntrinsic(intrinsic, baseType)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AsIntCon
src/jit/codegenlinear.h
Outdated
@@ -134,6 +134,8 @@ void genFMAIntrinsic(GenTreeHWIntrinsic* node); | |||
void genLZCNTIntrinsic(GenTreeHWIntrinsic* node); | |||
void genPCLMULQDQIntrinsic(GenTreeHWIntrinsic* node); | |||
void genPOPCNTIntrinsic(GenTreeHWIntrinsic* node); | |||
void genJumpTableFallback( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think @sdmaclea had a good approach to this in one of his PRs.
Specifically, the codegen is essentially the same except for the actual emitted instruction and could be handled via a lambda.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
src/jit/hwintrinsicxarch.cpp
Outdated
case NI_SSE2_ShiftRightLogical128BitLane: | ||
impPopStack(); | ||
impPopStack(); | ||
return gtNewSimdHWIntrinsicNode(simdType, NI_SSE2_SetZeroVector128, baseType, genTypeSize(simdType)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be combined with AVX2 with if / else clause for choosing NI_SSE2_SetZeroVector128
or NI_AVX_SetZeroVector256
intrinsic parameter
1ae1ee1
to
b847cb0
Compare
src/jit/hwintrinsiclistxarch.h
Outdated
@@ -144,6 +144,7 @@ HARDWARE_INTRINSIC(SSE2_CompareNotLessThan, "CompareNot | |||
HARDWARE_INTRINSIC(SSE2_CompareNotLessThanOrEqual, "CompareNotLessThanOrEqual", SSE2, 6, 16, 2, {INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_cmppd}, HW_Category_SimpleSIMD, HW_Flag_NoFlag) | |||
HARDWARE_INTRINSIC(SSE2_CompareOrdered, "CompareOrdered", SSE2, 7, 16, 2, {INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_cmppd}, HW_Category_SimpleSIMD, HW_Flag_NoFlag) | |||
HARDWARE_INTRINSIC(SSE2_CompareUnordered, "CompareUnordered", SSE2, 3, 16, 2, {INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_cmppd}, HW_Category_SimpleSIMD, HW_Flag_NoFlag) | |||
HARDWARE_INTRINSIC(SSE2_ConvertScalarToVector128Int32, "ConvertScalarToVector128Int32", SSE2, -1, 16, 1, {INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_mov_i2xmm, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid}, HW_Category_SimpleSIMD, HW_Flag_NoFlag) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is implemented in #16237 as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@4creators No worries, I just use this intrinsic as an internal helper. Once you implement it, I will remove here.
src/jit/hwintrinsiccodegenxarch.cpp
Outdated
{ | ||
emit->emitIns_R_I(INS_cmp, EA_4BYTE, nonConstImmReg, maxImmValue); | ||
emitJumpKind jge = genJumpKindForOper(GT_GE, CK_UNSIGNED); | ||
genJumpToThrowHlpBlk(jge, SCK_ARG_RNG_EXCPN); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The runtime intermittently gets "failed to find exception throw block". @CarolEidt @tannergooding @mikedn @jkotas Do you know why?
19:06:15 CLRJIT! CHECK::Trigger + 0x30E (0x7354b789)
19:06:15 CLRJIT! CodeGen::genJumpToThrowHlpBlk + 0x13C (0x73435bd7)
19:06:15 CLRJIT! CodeGen::genJumpTableFallback<<lambda_8c168ac1d3dcdd43425f08b8cdfa80f0> > + 0xF7 (0x7353e539)
19:06:15 CLRJIT! CodeGen::genHWIntrinsic + 0x2AE (0x7353ec79)
19:06:15 CLRJIT! CodeGen::genCodeForTreeNode + 0x43D (0x73516b55)
19:06:15 CLRJIT! CodeGen::genCodeForBBlist + 0x80E (0x7343936e)
19:06:15 CLRJIT! CodeGen::genGenerateCode + 0x283 (0x73434f23)
19:06:15 CLRJIT! Compiler::compCompile + 0x72D (0x73440571)
19:06:15 CLRJIT! Compiler::compCompileHelper + 0x589 (0x73441767)
19:06:15 CLRJIT! Compiler::compCompile + 0x622 (0x73440c53)
19:06:15 File: d:\j\workspace\x86_checked_w---b1af1919\src\jit\codegencommon.cpp Line: 2826
19:06:15 Image: D:\j\workspace\x86_checked_w---b1af1919\bin\tests\Windows_NT.x86.Checked\Tests\Core_Root\CoreRun.exe
{
/* Find the helper-block which raises the exception. */
Compiler::AddCodeDsc* add =
compiler->fgFindExcptnTarget(codeKind, compiler->bbThrowIndex(compiler->compCurBB));
PREFIX_ASSUME_MSG((add != nullptr), ("ERROR: failed to find exception throw block"));
tgtBlk = add->acdDstBlk;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Range checks must be exposed in the importer, so that they can be appropriately modeled in the control flow. Check out the places in simd.cpp where it generates GT_SIMD_CHK
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@CarolEidt Thanks for pointing out. IMO, I still do not like the current solution to non-const (that hardcodes a jump-table fallback in the compiler), which
- needs more IR to represent the range-check because the out-of-range constant argument will become a variable in the recursive function-body.
- the codegen always needs to consider keeping consistent semantics with the front-end.
So I think the software switch-table fallback can significantly simplify the compiler implementation that only deals with constant imm-argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought we determined that, we could also do this as:
result Intrinsic(op1, op2)
{
RangeCheck();
return Intrinsic(op1, op2);
}
Which only leaves the switch table up to the JIT.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we do it all in the JIT however, it still seems like a one time cost in implementing a helper function to do this, and it only needs to happen in the fallback case, which is already on its own codepath.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the software switch-table is not a problem now. In the future, if the compiler determines to optimize away the switch, we can use an attribute to disable the optimization on these intrinsics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the future, if the compiler determines to optimize away the switch, we can use an attribute to disable the optimization on these intrinsics.
There are multiple different compilers, and some (such as the C# compilers) don't have an option to disable optimizations for a given method. This would only work for an AOT or JIT compiler that was performing the optimization, and only if it respected the flag.
I think it would be good to better understand the specific pushback here. We already discussed this at some length in at least one of the other threads and landed on one of these two approaches:
- do the range check in managed and emit the switch table in the JIT
- emit the range check and the switch table in the JIT
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I'm concerned, it is a non-starter to consider introducing a throw in codegen that has not been introduced into the IR earlier. In addition to the technical infeasibility, it would render it impossible to apply range-check optimizations to eliminate it.
I am mildly in favor of emitting both the range check and the switch table in the JIT, primarily because it doesn't rely on the IL implementation and the JIT to both (and separately) specify the appropriate range values.
b157c86
to
f84095e
Compare
Rebased and resolved a conflict from stack-level-setter. |
src/jit/codegenxarch.cpp
Outdated
@@ -3756,7 +3759,10 @@ void CodeGen::genCodeForCmpXchg(GenTreeCmpXchg* tree) | |||
// generate code for BoundsCheck nodes | |||
void CodeGen::genRangeCheck(GenTree* oper) | |||
{ | |||
#ifdef FEATURE_SIMD | |||
#if defined(FEATURE_HW_INTRINSICS) && defined(FEATURE_SIMD) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't the standalone GT_HWIntrinsic_CHK
case also be handled (at the very least, based on the ifdefs
elsewhere)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So far, all the imm-intrinsics are SIMD. Do you think it is necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that all #ifdef
s concerning the HWIntrinsic Check should be consistent. So they either all need to be under #if defined(FEATURE_HW_INTRINSICS) && defined(FEATURE_SIMD)
or they should all be under #ifdef FEATURE_HW_INTRINSICS
. Having a mix of both (when there isn't two actual distinctions of where the code applies) just makes it confusing (and prone to breakage) later down the road.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, I will add a standalone #ifdef GT_HWIntrinsic_CHK
.
src/jit/emitxarch.cpp
Outdated
@@ -5466,6 +5479,43 @@ void emitter::emitIns_SIMD_R_R_R(instruction ins, emitAttr attr, regNumber reg, | |||
} | |||
} | |||
|
|||
static bool isSSEShift(instruction ins) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is Sse
the appropriate term here, since it also looks to cover AVX instructions, etc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These special encoding shift instructions were introduced by SSE2, and AVX just gives them the VEX-encoding version. There are some AVX/AVX2-only shift instructions that don't belong to this group.
|
||
void emitter::emitIns_SIMD_R_R_I(instruction ins, emitAttr attr, regNumber reg, regNumber reg1, int ival) | ||
{ | ||
// TODO-XARCH refactoring emitIns_R_R_I to handle SSE2/AVX2 shift as well as emitIns_R_I |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about this isn't working today?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SSE2 shift instructions require special encoding, we need to improve emitIns_R_R_I
to handle https://github.com/dotnet/coreclr/blob/master/src/jit/emitxarch.cpp#L3688-L3700 as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we log an actual bug, as well as this TODO?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@CarolEidt Could you please take a look when you get a chance? This PR is blocking certain subsequent work. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am going to approve this PR so that it can be merged and unlbock other work (though I will defer to @tannergooding to do the merge. I hope that you will consider doing a quick follow-up PR to address my comment requests, and respond with clarification on whether there is a known case where you would have only one of HW_Flag_NoJmpTableIMM
and HW_Flag_MaybeIMM
set.
I filed #16568 to consider merging of the range check operators.
|
||
// NoJmpTable IMM | ||
// the imm intrinsic does not need jumptable fallback when it gets non-const argument | ||
HW_Flag_NoJmpTableIMM = 0x2000, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still don't understand why both of these are required. In what case would we require a jump table if there was a non-immediate form available, and conversely, in what case would there be a non-immediate form available but we still require a jump table? Are there cases where the values covered by an immediate are somehow broader or otherwise disjoint from the values covered by a non-immediate form? I note that the existing intrinsics you added always have neither or both of these flags set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still don't understand why both of these are required.
@CarolEidt The new-style intrinsic is based on function names rather than signatures, and some intrinsics have "imm" and simple-SIMD overloads both. For example
public static Vector256<sbyte> Shuffle(Vector256<sbyte> value, Vector256<sbyte> mask);
public static Vector256<int> Shuffle(Vector256<int> value, byte control) ;
so a category HW_Category_IMM
is not enough to indicate a intrinsic requiring imm8
or not (the above intrinsic should have flag HW_Flag_MaybeIMM
). Consequently, we have to use the function bool Compiler::isImmHWIntrinsic(NamedIntrinsic intrinsic, GenTree* lastOp)
.
In this kind of imm-intrinsic (HW_Flag_MaybeIMM
), certain intrinsics are special, which have encoding that can carry the imm8
operand in XMM registers, for example
public static Vector256<ulong> ShiftRightLogical(Vector256<ulong> value, byte count);
public static Vector256<ulong> ShiftRightLogical(Vector256<ulong> value, Vector128<ulong> count);
^^^^^^
So, this intrinsic can have flag HW_Flag_NoJmpTableIMM
to use XMM version as the non-constant fallback (rather than a jump-table).
I note that the existing intrinsics you added always have neither or both of these flags set.
Consequently, Avx2.Shuffle
will need HW_Flag_MaybeIMM
but no HW_Flag_NoJmpTableIMM
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, this intrinsic can have flag HW_Flag_NoJmpTableIMM to use XMM version as the non-constant fallback (rather than a jump-table).
I believe that the ISA design (carry the imm8 operand in XMM registers) is intentional for this situation, and ARM also has similar instructions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, thanks for the explanation.
return false; | ||
} | ||
|
||
if ((flagsOfHWIntrinsic(intrinsic) & HW_Flag_MaybeIMM) != 0 && genActualType(lastOp->TypeGet()) != TYP_INT) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not obvious why this is correct - it needs comments. This relies on the observation that all the HW_Flag_MaybeIMM
cases have either an immmediate integer argument, or a non-integer argument, but not a non-constant integer argument.
Perhaps your "Return Value" comment should say:
Return true iff the intrinsics is an imm-intrinsic overload.
Note that some intrinsics, with HW_Flag_MaybeIMM set, have both imm (integer immediate) and vector (i.e. non-TYP_INT) overloads.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! Will do.
src/jit/hwintrinsicxarch.cpp
Outdated
return nullptr; | ||
} | ||
|
||
// TODO-XARCH-CQ repleace imm-intrinsic by certain lower-latency intrinsic based-on imm-values. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would urge you to just remove this comment.
@@ -97,6 +97,10 @@ GTNODE(SIMD_CHK , GenTreeBoundsChk ,0,GTK_SPECIAL|GTK_NOVALUE)// Compa | |||
// does the compare, so that it can be more easily optimized. But that involves generating qmarks at import time... | |||
#endif // FEATURE_SIMD | |||
|
|||
#ifdef FEATURE_HW_INTRINSICS | |||
GTNODE(HW_INTRINSIC_CHK , GenTreeBoundsChk ,0,GTK_SPECIAL|GTK_NOVALUE)// Compare whether an imm8 argument is in the valid range, and throw ArgumentOutOfRangeException if not. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I concede your point. These should probably be merged, though I am not 100% sure that is without risk, and I really don't want to hold this up for something that might lead to complications.
@fiigii, we should run the HWIntrinsic tests before merging. NOTE: These jobs will currently fail due to an unrelated issue: https://github.com/dotnet/coreclr/issues/16566. I've held off on kicking these jobs myself until after the PR resolving that is merged: #16567 |
@fiigii, I would also like to see bugs logged for the 2-3 pending issues before merging (they all correspond to |
src/jit/emitxarch.cpp
Outdated
emitIns_R_R(INS_movaps, attr, reg, reg1); | ||
} | ||
|
||
if ((ins == INS_psrldq || ins == INS_pslldq) && ival > 15) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why just psrldq
and pslldq
instead of isSseShift
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, will fix.
|
||
const unsigned maxByte = (unsigned)Compiler::immUpperBoundOfHWIntrinsic(intrinsic) + 1; | ||
assert(maxByte <= 256); | ||
BasicBlock* jmpTable[256]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: BasicBlock* jmpTable[maxByte]
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maxByte
is not a compile-time constant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right, missed that 😄
src/jit/hwintrinsiccodegenxarch.cpp
Outdated
for (unsigned i = 0; i < maxByte; i++) | ||
{ | ||
jmpTable[i] = genCreateTempLabel(); | ||
JITDUMP(" DD L_M%03u_BB%02u\n", Compiler::s_compMethodsCount, jmpTable[i]->bbNum); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: You removed JITDUMP("\n J_M%03u_DS%02u LABEL DWORD\n", Compiler::s_compMethodsCount, jmpTableBase);
from above, but not this corresponding line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mikedn suggested removing that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I suggested removing both :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I was indicating that if one line is removed, the other should be as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mikedn @tannergooding Got it! Thanks.
src/jit/hwintrinsiccodegenxarch.cpp
Outdated
for (unsigned i = 0; i < maxByte; i++) | ||
{ | ||
genDefineTempLabel(jmpTable[i]); | ||
emitSwCase((int)i); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not keep this as a byte
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The underlying functions all accept int
, so it may not be necessary to introduce one more type to mess here...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But the loop variable is still unsigned :)
src/jit/codegenxarch.cpp
Outdated
noway_assert(oper->OperGet() == GT_ARR_BOUNDS_CHECK); | ||
#endif // !FEATURE_SIMD | ||
|
||
assert(oper->OperIsBoundsCheck()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be kept as a noway_assert
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure, but will fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, noway_assert
is pointless here. All other code gen functions use assert
.
src/jit/hwintrinsicxarch.cpp
Outdated
if (!lastOp->IsCnsIntOrI() && !mustExpand) | ||
// The imm-HWintrinsics that do not accept all imm8 values may throw exception or transform to other | ||
// intrinsics when the imm argument is not in the valid range | ||
if ((flags & HW_Flag_FullRangeIMM) == 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you log a bug tracking us adding a helper for checking if a flag is set or unset?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do, and this comment should be updated, thanks.
And add a new range-check IR for x86 imm-intrinsics.
for (unsigned i = 0; i < maxByte; i++) | ||
{ | ||
genDefineTempLabel(jmpTable[i]); | ||
emitSwCase(i); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tannergooding @mikedn Changed emitSwCase
to accept unsigned
, so that this function only uses unsigned
variables.
test Windows_NT x64 Checked jitincompletehwintrinsic test Windows_NT x86 Checked jitincompletehwintrinsic test Ubuntu x64 Checked jitincompletehwintrinsic test OSX10.12 x64 Checked jitincompletehwintrinsic |
@CarolEidt @tannergooding @mikedn Thank you so much for your reviwe. All the feedback should already get fixed. Logged the remaining works at https://github.com/dotnet/coreclr/issues/16594 https://github.com/dotnet/coreclr/issues/16575 https://github.com/dotnet/coreclr/issues/16543 https://github.com/dotnet/coreclr/issues/16568 |
@fiigii thanks for responding to the feedback so far. I'll get this merged after all the tests complete (and pass). |
Thanks for the changes & responses. I'm happy with where we are on this. |
@4creators This PR is going to be merged. Would you like to complete the SSE2 intrinsic implementation? As I know, the code freeze date of .NET Core 2.1 may be coming soon, we probably prefer to enable all the SSE/SSE2/SSE3/SSSE3/SSE4.1/AVX intrinsics in this version of runtime. |
@fiigii Thanks for information. I will try to implement remaining SSE2 intrinsics asap. What is a plan on AVX2? Do we intend to finish them for 2.1 release? |
I will try my best to implement as many as AVX2 intrinsics (depends on the deadline). |
@4creators, if you could send me an e-mail (tagoo at microsoft dot com) it might be a bit easier to sync up on this without polluting the thread. |
@tannergooding Just did it |
OSX tests get build timed out. Other tests all passed. |
This PR updates the table-driven framework to support x86 imm-intrinsics. After this update, implementing imm-intrinsics just needs one-line definition in
hwintrinsiclistxarch.h
.This PR implements certain imm-intrinsics which have special semanctics
Avx.Compare\CompareScalar
just accepts imm-value in 0-31. Out of range argument will triggerArgumentOutOfRangeException
.Avx2\Sse2.ShiftLeftLogical\ShiftRightLogical\ShiftRightArithmetic
never throw any exceptionshift* ymm, ymm, xmm
(orshift* xmm, xmm
)form when the imm-arg is not a constant. For example,Avx2.ShiftLeftLogical(v, i)
would generatevmovd xmm1, edi; vpsllw ymm0, ymm0, xmm1
instead of the jumptable fallback.ShiftLeftLogical\ShiftRightLogical
will be converted toSetZeroVector256\128
on out of range imm-arg.ShiftRightArithmetic
's imm-arg will be normalized to the maximum valid value on out of range imm-arg.Avx2\Sse2.ShiftLeftLogical128BitLane
will be converted toSetZeroVector256\128
on out of range imm-arg.