-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Conversation
@@ -628,12 +628,12 @@ public abstract class Avx2 : Avx | |||
public static unsafe Vector256<double> GatherVector256(double* baseAddress, Vector128<int> index, byte scale) { throw new PlatformNotSupportedException(); } | |||
/// <summary> | |||
/// __m128i _mm256_i64gather_epi32 (int const* base_addr, __m256i vindex, const int scale) | |||
/// VPGATHERQD ymm, vm64y, ymm | |||
/// VPGATHERQD xmm, vm64y, xmm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed some doc mistake of Gather intrinsic
case 8: | ||
return GatherVector128(baseAddress, index, 8); | ||
default: | ||
throw new ArgumentOutOfRangeException(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The scale
parameter has discrete valid values, so the RyuJIT bound-check does not work. Use managed non-const fallback to handle Gather intrinsic, which we do not need to worry about compiler optimization on the switch table.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should add comments to that effect in the function header for this method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a comment "The scale parameter should be 1, 2, 4 or 8, otherwise, ArgumentOutOfRangeException will be thrown." for each function header.
src/jit/gentree.h
Outdated
@@ -4112,6 +4112,7 @@ struct GenTreeSIMD : public GenTreeJitIntrinsic | |||
struct GenTreeHWIntrinsic : public GenTreeJitIntrinsic | |||
{ | |||
NamedIntrinsic gtHWIntrinsicId; | |||
var_types gtIndexBaseType; // for AVX2 Gather* intrinsics |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gather intrinsics have complex overloads that need additional information (the base-type of index vector) for codegen, so adding a field in IR. But that let GenTreeHWIntrinsic
become a large node. @CarolEidt do you think it ok?
#endif | ||
|
||
return 3; | ||
return numArgs; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GatherMaskVector*
have 5 parameters that breaks our previous numArgs
assumption.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we actually have to carry all 5 args through? They eventually get folded down to a target register
, vsib encoding
, and mask register
, so I would think we we could do some of this folding in the importer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so I would think we we could do some of this folding in the importer.
That seems to require a new IR to contain the folded VSIB.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I guess we could only fold it to 3 if we knew the register for index
before-hand....
It just seems bad to make this many changes for a single family of instructions, since it will impact the 90% case (which is the other HWIntrinsics)...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this impact should be okay, as we have had this code (extracting numArgs
from IR).
many changes for a single family of instructions
We may have more intrinsics with >3 arguments, like SetVector* that we need to bring its IR through for all-const optimization.
Yes, I have addressed all the special encoding requests. But some fault issues (like the last one in the above list) are difficult to test... |
@@ -546,243 +546,963 @@ public abstract class Avx2 : Avx | |||
/// __m128i _mm_i32gather_epi32 (int const* base_addr, __m128i vindex, const int scale) | |||
/// VPGATHERDD xmm, vm32x, xmm | |||
/// </summary> | |||
public static unsafe Vector128<int> GatherVector128(int* baseAddress, Vector128<int> index, byte scale) => GatherVector128(baseAddress, index, scale); | |||
public static unsafe Vector128<int> GatherVector128(int* baseAddress, Vector128<int> index, byte scale) | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we should apply AggressiveInlining
attribute to all managed implementations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, the jump-table fallback should always be a call-node.
regNumber index, | ||
int scale, | ||
int offs) | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Coding convention: method needs header
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
src/jit/emitxarch.cpp
Outdated
@@ -4326,6 +4330,53 @@ void emitter::emitIns_R_R_AR(instruction ins, emitAttr attr, regNumber reg1, reg | |||
emitCurIGsize += sz; | |||
} | |||
|
|||
bool IsAVX2GatherInstruction(instruction ins) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Coding convention: method needs header
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do.
@@ -4112,6 +4112,7 @@ struct GenTreeSIMD : public GenTreeJitIntrinsic | |||
struct GenTreeHWIntrinsic : public GenTreeJitIntrinsic | |||
{ | |||
NamedIntrinsic gtHWIntrinsicId; | |||
var_types gtIndexBaseType; // for AVX2 Gather* intrinsics |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gather intrinsics have complex overloads that need additional information (the base-type of index vector) for codegen, so adding a field in IR. But that let GenTreeHWIntrinsic become a large node. @CarolEidt do you think it ok?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is probably not a big issue for intrinsics, though for methods with heavy intrinsic usage it could be an impact. Did you consider deriving from GenTreeHWIntrinsic
to make a specialized node?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might also be worth considering whether to define something like GenTreeHWIntrinsicBig
, to isolate both the additional fields as well as the extra operands.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps make NamedIntrinsic
unsigned short
? 64k intrinsics ought to be enough for anybody.
There are also spare bytes in the GenTree
class but I don't know an elegant way to use those in derived classes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps make NamedIntrinsic unsigned short? 64k intrinsics ought to be enough for anybody.
@mikedn Good point! Yes, 64k is definitely enough for the foreseeable future (AVX-512 based ISAs).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make NamedIntrinsic unsigned short
Made this change, now GenTreeHWIntrinsic
is still a small node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the gtIndexBaseType
actually needed for? Can it not be inferred from the rest of the signature?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't review the implementation in System.Private.CoreLib.dll
, and I will be out of the office until next Thursday. But I had a few comments.
src/jit/emitxarch.cpp
Outdated
@@ -4326,6 +4330,67 @@ void emitter::emitIns_R_R_AR(instruction ins, emitAttr attr, regNumber reg1, reg | |||
emitCurIGsize += sz; | |||
} | |||
|
|||
// return true if ins is an AVX2 Gather instruction |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is new, you should add a header here as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do, thanks.
regNumber base, | ||
regNumber index, | ||
int scale, | ||
int offs); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if this was formatted this way by jit-format, but I'd prefer to see it declared in a format more similar to those around it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this was given by jit-format, let me try to unify the format.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This format is given by clang-format and cannot manually change...
src/jit/gentree.cpp
Outdated
* isn't handling self-assignment of struct variables correctly. This issue may not | ||
* surface if struct promotion is ON (which is the case on x86/arm). But still the | ||
* fundamental issue exists that needs to be addressed. | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why was this changed?
src/jit/gentree.h
Outdated
INDEBUG(gtCostsInitialized = | ||
tree->gtCostsInitialized;) // If the 'tree' costs aren't initialized, we'll hit an assert below. | ||
INDEBUG(gtCostsInitialized = tree->gtCostsInitialized;) // If the 'tree' costs aren't initialized, we'll hit an | ||
// assert below. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, not sure why this changed - but if it's going to change, the comment should be moved above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, on my computer, jit-format + vs code is not stable on a few jit source files, which can output different format with each run, especially for some comments or whitespace.
Will fix, sorry for the mess.
@@ -4112,6 +4112,7 @@ struct GenTreeSIMD : public GenTreeJitIntrinsic | |||
struct GenTreeHWIntrinsic : public GenTreeJitIntrinsic | |||
{ | |||
NamedIntrinsic gtHWIntrinsicId; | |||
var_types gtIndexBaseType; // for AVX2 Gather* intrinsics |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is probably not a big issue for intrinsics, though for methods with heavy intrinsic usage it could be an impact. Did you consider deriving from GenTreeHWIntrinsic
to make a specialized node?
src/jit/hwintrinsiccodegenxarch.cpp
Outdated
addrIndexReg = op3Reg; | ||
indexOp = op3; | ||
|
||
// the mask register will be cleaned by gather instructions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will be cleared by ...?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, will fix.
src/jit/hwintrinsiccodegenxarch.cpp
Outdated
@@ -1177,6 +1177,8 @@ void CodeGen::genHWIntrinsicJumpTableFallback(NamedIntrinsic intrinsi | |||
HWIntrinsicSwitchCaseBody emitSwCase) | |||
{ | |||
assert(nonConstImmReg != REG_NA); | |||
// AVX2 Gather intrinsics use managed non-const fallback |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This deserves more explanation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, will add more comments.
src/jit/lowerxarch.cpp
Outdated
MakeSrcContained(node, lastOp); | ||
} | ||
} | ||
|
||
if (!HWIntrinsicInfo::SupportsContainment(intrinsicId)) | ||
{ | ||
// Exit early if containment isn't supported |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're no longer really exiting early - was there some reason you moved the immediate containment above this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before this PR, IMM intrinsics do not have NoContainment ones. But AVX2 Gather*
are IMM intrinsic AND memory-load intrinsic that cannot work with containment (so they have NoContainment flag).
So we need to moving the immediate containment above the early exiting, which marks the IMM argument of Gather before return.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So then I think this comment needs to be deleted.
@@ -4112,6 +4112,7 @@ struct GenTreeSIMD : public GenTreeJitIntrinsic | |||
struct GenTreeHWIntrinsic : public GenTreeJitIntrinsic | |||
{ | |||
NamedIntrinsic gtHWIntrinsicId; | |||
var_types gtIndexBaseType; // for AVX2 Gather* intrinsics |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might also be worth considering whether to define something like GenTreeHWIntrinsicBig
, to isolate both the additional fields as well as the extra operands.
@dotnet-bot test Windows_NT x64 Checked jitincompletehwintrinsic @dotnet-bot test Windows_NT x86 Checked jitincompletehwintrinsic @dotnet-bot test Ubuntu x64 Checked jitincompletehwintrinsic |
Addressed feedback. @tannergooding @CarolEidt @eerhardt could you please take a look? Some applications dotnet/machinelearning#691 need the Gather intrinsic. |
I will be able to review Wednesday (or possibly late tomorrow), after I get back from vacation. |
@tannergooding @CarolEidt ping? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay in reviewing.
I have a number of questions, and suggestions for additional commenting.
src/jit/gentree.cpp
Outdated
@@ -362,7 +362,7 @@ void GenTree::InitNodeSize() | |||
#endif // FEATURE_SIMD | |||
|
|||
#ifdef FEATURE_HW_INTRINSICS | |||
static_assert_no_msg(sizeof(GenTreeHWIntrinsic) <= TREE_NODE_SZ_SMALL); | |||
static_assert_no_msg(sizeof(GenTreeHWIntrinsic) <= TREE_NODE_SZ_SMALL); // *** large node |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It appears that the 'large node' comment is no longer needed.
@@ -532,7 +592,8 @@ GenTree* Compiler::addRangeCheckIfNeeded(NamedIntrinsic intrinsic, GenTree* last | |||
assert(lastOp != nullptr); | |||
// Full-range imm-intrinsics do not need the range-check | |||
// because the imm-parameter of the intrinsic method is a byte. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment should mention the gather case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
case 8: | ||
return GatherVector128(baseAddress, index, 8); | ||
default: | ||
throw new ArgumentOutOfRangeException(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should add comments to that effect in the function header for this method.
src/jit/hwintrinsiccodegenxarch.cpp
Outdated
addrIndexReg = op3Reg; | ||
indexOp = op3; | ||
|
||
// the mask register will be cleared by gather instructions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand the point of this comment. The maskReg
is a tempreg, so I don't see that it matters whether it will be cleared. Perhaps you mean that we need to copy op4 to a temp reg to use for the mask because it will be over-written?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps you mean that we need to copy op4 to a temp reg to use for the mask because it will be over-written?
Yes, will update the comment.
if (targetReg != op1Reg) | ||
{ | ||
// copy source vector to the target register for masking merge | ||
emit->emitIns_R_R(INS_movaps, attr, targetReg, op1Reg); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if op2Reg == targetReg
? I don't think it's been set as delayFree.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Op2Reg is a GPR (base register of the address) and cannot be same as the vector target register.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right - thanks for clarifying.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A comment clarifying this would be nice
src/jit/hwintrinsiccodegenxarch.cpp
Outdated
} | ||
|
||
regNumber lastOpReg = lastOp->gtRegNum; | ||
genConsumeRegs(lastOp); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the purpose of having this here, rather than always consuming op3 before the if-stmt above, and then taking care of the other sources in that if stmt?
src/jit/lowerxarch.cpp
Outdated
MakeSrcContained(node, lastOp); | ||
} | ||
} | ||
|
||
if (!HWIntrinsicInfo::SupportsContainment(intrinsicId)) | ||
{ | ||
// Exit early if containment isn't supported |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So then I think this comment needs to be deleted.
case NI_AVX2_GatherMaskVector256: | ||
{ | ||
assert(numArgs == 5); | ||
// Any pair of the index, mask, or destination registers should be different |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment mentions index, mask and destination; it needs to map those to the opN names. Also, what about the address? Doesn't that also need to be distinct from the target, as the target will be overwritten?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, will update the comment.
Also, what about the address? Doesn't that also need to be distinct from the target, as the target will be overwritten?
Don't you mean the address base register? If yes, that is a GPR and cannot be same as the vector target register.
@CarolEidt Thank you for the review and suggestions, I have addressed all the feedback. |
@tannergooding - did you want to review this before merging? |
@tannergooding ping? |
case 8: | ||
return GatherVector128(baseAddress, index, 8); | ||
default: | ||
throw new ArgumentOutOfRangeException(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Should be throw new ArgumentOutOfRangeException(nameof(scale))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(same elsewhere as well)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other intrinsic throw the exception without the argument name. Shall we keep consistent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, we should fix the other intrinsics to be correct instead 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, logged at https://github.com/dotnet/coreclr/issues/19689
// scale - the scale number of VSIB | ||
// offs - the offset added to the memory address from base | ||
// | ||
void emitter::emitIns_R_AR_R(instruction ins, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we shouldn't be encoding the scale/offs in an Indir
node and letting emitHandleMemOp
handle it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The scale/offs would be on a GenTreeAddrMode
- presumably a GT_LEA
. It would make sense for that to be a child of the gather node, which would not only consolidate the scale/offs handling, but also would reduce the operand count of the gather.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@CarolEidt Thanks for the comments. Do you suggest to make the change in this PR? Or I can improve it in a new PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be fine (and probably a bit cleaner) to do it as a separate PR.
src/jit/gentree.cpp
Outdated
@@ -17522,6 +17522,11 @@ bool GenTreeHWIntrinsic::OperIsMemoryLoad() | |||
{ | |||
// Some AVX instructions here also have MemoryLoad sematics | |||
|
|||
if (HWIntrinsicInfo::isAVX2GatherIntrinsic(gtHWIntrinsicId)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why was this inserted here, rather than as part of the rest of the logic flow below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because Gather has 3-op and 5-op overloads, this early check simplified the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The check would have remained correct and simple by changing the code as such:
- // Do we have 3 operands?
- if (HWIntrinsicInfo::lookupNumArgs(this) != 3)
+ // Do we have less than 3 operands?
+ if (HWIntrinsicInfo::lookupNumArgs(this) < 3)
This also ensures that we aren't adding a "fast-path" for checks that don't need it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, will do.
@@ -479,8 +479,8 @@ struct GenTree | |||
// happening. | |||
void CopyCosts(const GenTree* const tree) | |||
{ | |||
INDEBUG(gtCostsInitialized = | |||
tree->gtCostsInitialized;) // If the 'tree' costs aren't initialized, we'll hit an assert below. | |||
// If the 'tree' costs aren't initialized, we'll hit an assert below. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are all these random comment re-alignments popping up? They seem unrelated to the PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clang-format is confused by some comments, so I fixed here to avoid more unrelated changes from clang-format.
case NI_AVX2_GatherMaskVector128: | ||
case NI_AVX2_GatherMaskVector256: | ||
{ | ||
GenTreeArgList* list = op1->AsArgList(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: It is helpful to break up the argList blocks, for readability: Ex:
coreclr/src/jit/hwintrinsiccodegenxarch.cpp
Line 217 in 7f36c93
GenTreeArgList* argList = op1->AsArgList(); |
src/jit/hwintrinsiccodegenxarch.cpp
Outdated
genConsumeRegs(op2); | ||
genConsumeRegs(op3); | ||
|
||
if (intrinsicId == NI_AVX2_GatherMaskVector128 || intrinsicId == NI_AVX2_GatherMaskVector256) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just differentiate on the argCount?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought intrinsic id is more readable, as other developers do not need to check API files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would think a comment or an assert would be sufficient for that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, will do.
src/jit/hwintrinsicxarch.cpp
Outdated
if (mustExpand && !HWIntrinsicInfo::HasFullRangeImm(intrinsic) && HWIntrinsicInfo::isImmOp(intrinsic, lastOp)) | ||
// AVX2 Gather intrinsics no not need the range-check | ||
// because their imm-parameter have discrete valid values that are handle by managed code | ||
if (mustExpand && !HWIntrinsicInfo::isAVX2GatherIntrinsic(intrinsic) && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like this should be the last check in the &&
chain, since the other two are more likely to exit the chain first
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, will fix
src/jit/lowerxarch.cpp
Outdated
@@ -2576,9 +2576,19 @@ void Lowering::ContainCheckHWIntrinsic(GenTreeHWIntrinsic* node) | |||
GenTree* op2 = node->gtGetOp2(); | |||
GenTree* op3 = nullptr; | |||
|
|||
if (HWIntrinsicInfo::lookupCategory(intrinsicId) == HW_Category_IMM) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why was this moved up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before this PR, IMM intrinsics do not have NoContainment ones. But AVX2 Gather* are IMM intrinsic AND memory-load intrinsic that cannot work with containment (so they have NoContainment flag).
So we need to move the immediate containment above the early exiting, which marks the IMM argument of Gather before return.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still not sure I see the problem.
There are multiple intrinsics which can be sometimes contained, or partially contained (just the immediate node). You don't mark those as NoContainment
, but exit below (as a special case) instead.
@@ -0,0 +1,34 @@ | |||
<?xml version="1.0" encoding="utf-8"?> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add these as templated tests, since we are trying to reduce the number of manually managed tests that we have for the intrinsics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and when doing so, please ensure that the new templates follow the same general pattern as previous templates, to ensure that all the special-cases are covered (such as indirect invocation)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gather intrinsic have very special overloads and semantics that do not work with the current test template framework.
Let me move the test cases to template later in a new PR, some applications are waiting for these intrinsic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then, at the very least, you need to ensure all the special cases are still covered in the manual code.
The current templates cover 14 or so scenarios that are meant to test the various code patterns that must be supported and that might influence the JIT (including things like indirect invocation
, various patterns that may impact containment
, etc).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm also not sure I agree with "does not work with the test template framework".
The framework is very generic and just works with a template and named string values. You should be able to create a template that fits your needs (and that follows the existing pattern) and have it just work.
@tannergooding @CarolEidt Thank you for the review. Addressed all the feedback. Will move the tests to template and investigate the indir solution later. |
I would still like to see test coverage for the indirect invocation case, to validate that it is working as intended. |
09f75c8
to
74236e5
Compare
@dotnet-bot test Windows_NT x64 Checked jitincompletehwintrinsic @dotnet-bot test Windows_NT x86 Checked jitincompletehwintrinsic @dotnet-bot test Ubuntu x64 Checked jitincompletehwintrinsic |
@tannergooding I have added indirect invocation cases, can we merge this PR? |
I'm fine with merging. I would appreciate it if you could log a bug tracking the test code cleanup (porting it to a template, etc). @CarolEidt, any other feedback on your end? |
@tannergooding thanks, will do. |
No additional feedback - I'm happy with merging |
Will merge after the last test finishes (it had timed out previously, so I restarted it). |
@tannergooding @CarolEidt Thank you so much for the review! |
@tannergooding could you please merge this PR? |
Done. Thanks for the work @fiigii |
Contribute to #16653