Skip to content
This repository was archived by the owner on Jan 23, 2023. It is now read-only.

Implement AVX2 Gather intrinsic #19392

Merged
merged 3 commits into from
Sep 5, 2018
Merged

Implement AVX2 Gather intrinsic #19392

merged 3 commits into from
Sep 5, 2018

Conversation

fiigii
Copy link

@fiigii fiigii commented Aug 9, 2018

Contribute to #16653

@@ -628,12 +628,12 @@ public abstract class Avx2 : Avx
public static unsafe Vector256<double> GatherVector256(double* baseAddress, Vector128<int> index, byte scale) { throw new PlatformNotSupportedException(); }
/// <summary>
/// __m128i _mm256_i64gather_epi32 (int const* base_addr, __m256i vindex, const int scale)
/// VPGATHERQD ymm, vm64y, ymm
/// VPGATHERQD xmm, vm64y, xmm
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed some doc mistake of Gather intrinsic

case 8:
return GatherVector128(baseAddress, index, 8);
default:
throw new ArgumentOutOfRangeException();
Copy link
Author

@fiigii fiigii Aug 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The scale parameter has discrete valid values, so the RyuJIT bound-check does not work. Use managed non-const fallback to handle Gather intrinsic, which we do not need to worry about compiler optimization on the switch table.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should add comments to that effect in the function header for this method.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment "The scale parameter should be 1, 2, 4 or 8, otherwise, ArgumentOutOfRangeException will be thrown." for each function header.

@@ -4112,6 +4112,7 @@ struct GenTreeSIMD : public GenTreeJitIntrinsic
struct GenTreeHWIntrinsic : public GenTreeJitIntrinsic
{
NamedIntrinsic gtHWIntrinsicId;
var_types gtIndexBaseType; // for AVX2 Gather* intrinsics
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gather intrinsics have complex overloads that need additional information (the base-type of index vector) for codegen, so adding a field in IR. But that let GenTreeHWIntrinsic become a large node. @CarolEidt do you think it ok?

#endif

return 3;
return numArgs;
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GatherMaskVector* have 5 parameters that breaks our previous numArgs assumption.

Copy link
Member

@tannergooding tannergooding Aug 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we actually have to carry all 5 args through? They eventually get folded down to a target register, vsib encoding, and mask register, so I would think we we could do some of this folding in the importer.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so I would think we we could do some of this folding in the importer.

That seems to require a new IR to contain the folded VSIB.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I guess we could only fold it to 3 if we knew the register for index before-hand....

It just seems bad to make this many changes for a single family of instructions, since it will impact the 90% case (which is the other HWIntrinsics)...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this impact should be okay, as we have had this code (extracting numArgs from IR).

many changes for a single family of instructions

We may have more intrinsics with >3 arguments, like SetVector* that we need to bring its IR through for all-const optimization.

@tannergooding
Copy link
Member

This section is important and we should ensure it doesn't impact any existing assumptions:
image

@fiigii
Copy link
Author

fiigii commented Aug 9, 2018

This section is important and we should ensure it doesn't impact any existing assumptions:

Yes, I have addressed all the special encoding requests. But some fault issues (like the last one in the above list) are difficult to test...

@@ -546,243 +546,963 @@ public abstract class Avx2 : Avx
/// __m128i _mm_i32gather_epi32 (int const* base_addr, __m128i vindex, const int scale)
/// VPGATHERDD xmm, vm32x, xmm
/// </summary>
public static unsafe Vector128<int> GatherVector128(int* baseAddress, Vector128<int> index, byte scale) => GatherVector128(baseAddress, index, scale);
public static unsafe Vector128<int> GatherVector128(int* baseAddress, Vector128<int> index, byte scale)
{

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we should apply AggressiveInlining attribute to all managed implementations

Copy link
Author

@fiigii fiigii Aug 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the jump-table fallback should always be a call-node.

regNumber index,
int scale,
int offs)
{

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coding convention: method needs header

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -4326,6 +4330,53 @@ void emitter::emitIns_R_R_AR(instruction ins, emitAttr attr, regNumber reg1, reg
emitCurIGsize += sz;
}

bool IsAVX2GatherInstruction(instruction ins)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coding convention: method needs header

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.

@@ -4112,6 +4112,7 @@ struct GenTreeSIMD : public GenTreeJitIntrinsic
struct GenTreeHWIntrinsic : public GenTreeJitIntrinsic
{
NamedIntrinsic gtHWIntrinsicId;
var_types gtIndexBaseType; // for AVX2 Gather* intrinsics
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gather intrinsics have complex overloads that need additional information (the base-type of index vector) for codegen, so adding a field in IR. But that let GenTreeHWIntrinsic become a large node. @CarolEidt do you think it ok?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is probably not a big issue for intrinsics, though for methods with heavy intrinsic usage it could be an impact. Did you consider deriving from GenTreeHWIntrinsic to make a specialized node?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might also be worth considering whether to define something like GenTreeHWIntrinsicBig, to isolate both the additional fields as well as the extra operands.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps make NamedIntrinsic unsigned short? 64k intrinsics ought to be enough for anybody.

There are also spare bytes in the GenTree class but I don't know an elegant way to use those in derived classes.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps make NamedIntrinsic unsigned short? 64k intrinsics ought to be enough for anybody.

@mikedn Good point! Yes, 64k is definitely enough for the foreseeable future (AVX-512 based ISAs).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make NamedIntrinsic unsigned short

Made this change, now GenTreeHWIntrinsic is still a small node.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the gtIndexBaseType actually needed for? Can it not be inferred from the rest of the signature?

Copy link

@CarolEidt CarolEidt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't review the implementation in System.Private.CoreLib.dll, and I will be out of the office until next Thursday. But I had a few comments.

@@ -4326,6 +4330,67 @@ void emitter::emitIns_R_R_AR(instruction ins, emitAttr attr, regNumber reg1, reg
emitCurIGsize += sz;
}

// return true if ins is an AVX2 Gather instruction

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is new, you should add a header here as well.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do, thanks.

regNumber base,
regNumber index,
int scale,
int offs);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if this was formatted this way by jit-format, but I'd prefer to see it declared in a format more similar to those around it.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this was given by jit-format, let me try to unify the format.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This format is given by clang-format and cannot manually change...

* isn't handling self-assignment of struct variables correctly. This issue may not
* surface if struct promotion is ON (which is the case on x86/arm). But still the
* fundamental issue exists that needs to be addressed.
*/

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this changed?

INDEBUG(gtCostsInitialized =
tree->gtCostsInitialized;) // If the 'tree' costs aren't initialized, we'll hit an assert below.
INDEBUG(gtCostsInitialized = tree->gtCostsInitialized;) // If the 'tree' costs aren't initialized, we'll hit an
// assert below.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, not sure why this changed - but if it's going to change, the comment should be moved above.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, on my computer, jit-format + vs code is not stable on a few jit source files, which can output different format with each run, especially for some comments or whitespace.
Will fix, sorry for the mess.

@@ -4112,6 +4112,7 @@ struct GenTreeSIMD : public GenTreeJitIntrinsic
struct GenTreeHWIntrinsic : public GenTreeJitIntrinsic
{
NamedIntrinsic gtHWIntrinsicId;
var_types gtIndexBaseType; // for AVX2 Gather* intrinsics

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is probably not a big issue for intrinsics, though for methods with heavy intrinsic usage it could be an impact. Did you consider deriving from GenTreeHWIntrinsic to make a specialized node?

addrIndexReg = op3Reg;
indexOp = op3;

// the mask register will be cleaned by gather instructions

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will be cleared by ...?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, will fix.

@@ -1177,6 +1177,8 @@ void CodeGen::genHWIntrinsicJumpTableFallback(NamedIntrinsic intrinsi
HWIntrinsicSwitchCaseBody emitSwCase)
{
assert(nonConstImmReg != REG_NA);
// AVX2 Gather intrinsics use managed non-const fallback

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This deserves more explanation.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, will add more comments.

MakeSrcContained(node, lastOp);
}
}

if (!HWIntrinsicInfo::SupportsContainment(intrinsicId))
{
// Exit early if containment isn't supported

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're no longer really exiting early - was there some reason you moved the immediate containment above this?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before this PR, IMM intrinsics do not have NoContainment ones. But AVX2 Gather* are IMM intrinsic AND memory-load intrinsic that cannot work with containment (so they have NoContainment flag).

So we need to moving the immediate containment above the early exiting, which marks the IMM argument of Gather before return.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So then I think this comment needs to be deleted.

@@ -4112,6 +4112,7 @@ struct GenTreeSIMD : public GenTreeJitIntrinsic
struct GenTreeHWIntrinsic : public GenTreeJitIntrinsic
{
NamedIntrinsic gtHWIntrinsicId;
var_types gtIndexBaseType; // for AVX2 Gather* intrinsics

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might also be worth considering whether to define something like GenTreeHWIntrinsicBig, to isolate both the additional fields as well as the extra operands.

@fiigii
Copy link
Author

fiigii commented Aug 13, 2018

@dotnet-bot test Windows_NT x64 Checked jitincompletehwintrinsic
@dotnet-bot test Windows_NT x64 Checked jitx86hwintrinsicnoavx
@dotnet-bot test Windows_NT x64 Checked jitx86hwintrinsicnoavx2
@dotnet-bot test Windows_NT x64 Checked jitx86hwintrinsicnosimd
@dotnet-bot test Windows_NT x64 Checked jitnox86hwintrinsic

@dotnet-bot test Windows_NT x86 Checked jitincompletehwintrinsic
@dotnet-bot test Windows_NT x86 Checked jitx86hwintrinsicnoavx
@dotnet-bot test Windows_NT x86 Checked jitx86hwintrinsicnoavx2
@dotnet-bot test Windows_NT x86 Checked jitx86hwintrinsicnosimd
@dotnet-bot test Windows_NT x86 Checked jitnox86hwintrinsic

@dotnet-bot test Ubuntu x64 Checked jitincompletehwintrinsic
@dotnet-bot test Ubuntu x64 Checked jitx86hwintrinsicnoavx
@dotnet-bot test Ubuntu x64 Checked jitx86hwintrinsicnoavx2
@dotnet-bot test Ubuntu x64 Checked jitx86hwintrinsicnosimd
@dotnet-bot test Ubuntu x64 Checked jitnox86hwintrinsic

@fiigii
Copy link
Author

fiigii commented Aug 20, 2018

Addressed feedback. @tannergooding @CarolEidt @eerhardt could you please take a look? Some applications dotnet/machinelearning#691 need the Gather intrinsic.

@tannergooding
Copy link
Member

I will be able to review Wednesday (or possibly late tomorrow), after I get back from vacation.

@fiigii
Copy link
Author

fiigii commented Aug 23, 2018

@tannergooding @CarolEidt ping?

Copy link

@CarolEidt CarolEidt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay in reviewing.
I have a number of questions, and suggestions for additional commenting.

@@ -362,7 +362,7 @@ void GenTree::InitNodeSize()
#endif // FEATURE_SIMD

#ifdef FEATURE_HW_INTRINSICS
static_assert_no_msg(sizeof(GenTreeHWIntrinsic) <= TREE_NODE_SZ_SMALL);
static_assert_no_msg(sizeof(GenTreeHWIntrinsic) <= TREE_NODE_SZ_SMALL); // *** large node

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It appears that the 'large node' comment is no longer needed.

@@ -532,7 +592,8 @@ GenTree* Compiler::addRangeCheckIfNeeded(NamedIntrinsic intrinsic, GenTree* last
assert(lastOp != nullptr);
// Full-range imm-intrinsics do not need the range-check
// because the imm-parameter of the intrinsic method is a byte.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment should mention the gather case.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

case 8:
return GatherVector128(baseAddress, index, 8);
default:
throw new ArgumentOutOfRangeException();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should add comments to that effect in the function header for this method.

addrIndexReg = op3Reg;
indexOp = op3;

// the mask register will be cleared by gather instructions

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand the point of this comment. The maskReg is a tempreg, so I don't see that it matters whether it will be cleared. Perhaps you mean that we need to copy op4 to a temp reg to use for the mask because it will be over-written?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps you mean that we need to copy op4 to a temp reg to use for the mask because it will be over-written?

Yes, will update the comment.

if (targetReg != op1Reg)
{
// copy source vector to the target register for masking merge
emit->emitIns_R_R(INS_movaps, attr, targetReg, op1Reg);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if op2Reg == targetReg? I don't think it's been set as delayFree.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Op2Reg is a GPR (base register of the address) and cannot be same as the vector target register.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right - thanks for clarifying.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A comment clarifying this would be nice

}

regNumber lastOpReg = lastOp->gtRegNum;
genConsumeRegs(lastOp);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the purpose of having this here, rather than always consuming op3 before the if-stmt above, and then taking care of the other sources in that if stmt?

MakeSrcContained(node, lastOp);
}
}

if (!HWIntrinsicInfo::SupportsContainment(intrinsicId))
{
// Exit early if containment isn't supported

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So then I think this comment needs to be deleted.

case NI_AVX2_GatherMaskVector256:
{
assert(numArgs == 5);
// Any pair of the index, mask, or destination registers should be different

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment mentions index, mask and destination; it needs to map those to the opN names. Also, what about the address? Doesn't that also need to be distinct from the target, as the target will be overwritten?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, will update the comment.

Also, what about the address? Doesn't that also need to be distinct from the target, as the target will be overwritten?

Don't you mean the address base register? If yes, that is a GPR and cannot be same as the vector target register.

@fiigii
Copy link
Author

fiigii commented Aug 23, 2018

@CarolEidt Thank you for the review and suggestions, I have addressed all the feedback.

@CarolEidt
Copy link

@tannergooding - did you want to review this before merging?

@fiigii
Copy link
Author

fiigii commented Aug 27, 2018

@tannergooding ping?

case 8:
return GatherVector128(baseAddress, index, 8);
default:
throw new ArgumentOutOfRangeException();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Should be throw new ArgumentOutOfRangeException(nameof(scale))

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(same elsewhere as well)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other intrinsic throw the exception without the argument name. Shall we keep consistent?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we should fix the other intrinsics to be correct instead 😄

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// scale - the scale number of VSIB
// offs - the offset added to the memory address from base
//
void emitter::emitIns_R_AR_R(instruction ins,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we shouldn't be encoding the scale/offs in an Indir node and letting emitHandleMemOp handle it

@CarolEidt ?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The scale/offs would be on a GenTreeAddrMode - presumably a GT_LEA. It would make sense for that to be a child of the gather node, which would not only consolidate the scale/offs handling, but also would reduce the operand count of the gather.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CarolEidt Thanks for the comments. Do you suggest to make the change in this PR? Or I can improve it in a new PR?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be fine (and probably a bit cleaner) to do it as a separate PR.

@@ -17522,6 +17522,11 @@ bool GenTreeHWIntrinsic::OperIsMemoryLoad()
{
// Some AVX instructions here also have MemoryLoad sematics

if (HWIntrinsicInfo::isAVX2GatherIntrinsic(gtHWIntrinsicId))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this inserted here, rather than as part of the rest of the logic flow below?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because Gather has 3-op and 5-op overloads, this early check simplified the code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The check would have remained correct and simple by changing the code as such:

- // Do we have 3 operands?
- if (HWIntrinsicInfo::lookupNumArgs(this) != 3)
+ // Do we have less than 3 operands?
+ if (HWIntrinsicInfo::lookupNumArgs(this) < 3)

This also ensures that we aren't adding a "fast-path" for checks that don't need it

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, will do.

@@ -479,8 +479,8 @@ struct GenTree
// happening.
void CopyCosts(const GenTree* const tree)
{
INDEBUG(gtCostsInitialized =
tree->gtCostsInitialized;) // If the 'tree' costs aren't initialized, we'll hit an assert below.
// If the 'tree' costs aren't initialized, we'll hit an assert below.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are all these random comment re-alignments popping up? They seem unrelated to the PR.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clang-format is confused by some comments, so I fixed here to avoid more unrelated changes from clang-format.

case NI_AVX2_GatherMaskVector128:
case NI_AVX2_GatherMaskVector256:
{
GenTreeArgList* list = op1->AsArgList();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: It is helpful to break up the argList blocks, for readability: Ex:

GenTreeArgList* argList = op1->AsArgList();

genConsumeRegs(op2);
genConsumeRegs(op3);

if (intrinsicId == NI_AVX2_GatherMaskVector128 || intrinsicId == NI_AVX2_GatherMaskVector256)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just differentiate on the argCount?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought intrinsic id is more readable, as other developers do not need to check API files.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would think a comment or an assert would be sufficient for that

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, will do.

if (mustExpand && !HWIntrinsicInfo::HasFullRangeImm(intrinsic) && HWIntrinsicInfo::isImmOp(intrinsic, lastOp))
// AVX2 Gather intrinsics no not need the range-check
// because their imm-parameter have discrete valid values that are handle by managed code
if (mustExpand && !HWIntrinsicInfo::isAVX2GatherIntrinsic(intrinsic) &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like this should be the last check in the && chain, since the other two are more likely to exit the chain first

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, will fix

@@ -2576,9 +2576,19 @@ void Lowering::ContainCheckHWIntrinsic(GenTreeHWIntrinsic* node)
GenTree* op2 = node->gtGetOp2();
GenTree* op3 = nullptr;

if (HWIntrinsicInfo::lookupCategory(intrinsicId) == HW_Category_IMM)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this moved up?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before this PR, IMM intrinsics do not have NoContainment ones. But AVX2 Gather* are IMM intrinsic AND memory-load intrinsic that cannot work with containment (so they have NoContainment flag).

So we need to move the immediate containment above the early exiting, which marks the IMM argument of Gather before return.

Copy link
Member

@tannergooding tannergooding Aug 27, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still not sure I see the problem.

There are multiple intrinsics which can be sometimes contained, or partially contained (just the immediate node). You don't mark those as NoContainment, but exit below (as a special case) instead.

@@ -0,0 +1,34 @@
<?xml version="1.0" encoding="utf-8"?>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add these as templated tests, since we are trying to reduce the number of manually managed tests that we have for the intrinsics.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and when doing so, please ensure that the new templates follow the same general pattern as previous templates, to ensure that all the special-cases are covered (such as indirect invocation)

Copy link
Author

@fiigii fiigii Aug 27, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gather intrinsic have very special overloads and semantics that do not work with the current test template framework.

Let me move the test cases to template later in a new PR, some applications are waiting for these intrinsic

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then, at the very least, you need to ensure all the special cases are still covered in the manual code.

The current templates cover 14 or so scenarios that are meant to test the various code patterns that must be supported and that might influence the JIT (including things like indirect invocation, various patterns that may impact containment, etc).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also not sure I agree with "does not work with the test template framework".

The framework is very generic and just works with a template and named string values. You should be able to create a template that fits your needs (and that follows the existing pattern) and have it just work.

@fiigii
Copy link
Author

fiigii commented Aug 27, 2018

@tannergooding @CarolEidt Thank you for the review. Addressed all the feedback.

Will move the tests to template and investigate the indir solution later.

@tannergooding
Copy link
Member

I would still like to see test coverage for the indirect invocation case, to validate that it is working as intended.

@fiigii fiigii force-pushed the gather branch 2 times, most recently from 09f75c8 to 74236e5 Compare August 27, 2018 22:15
@fiigii
Copy link
Author

fiigii commented Aug 31, 2018

@dotnet-bot test Windows_NT x64 Checked jitincompletehwintrinsic
@dotnet-bot test Windows_NT x64 Checked jitx86hwintrinsicnoavx
@dotnet-bot test Windows_NT x64 Checked jitx86hwintrinsicnoavx2
@dotnet-bot test Windows_NT x64 Checked jitx86hwintrinsicnosimd
@dotnet-bot test Windows_NT x64 Checked jitnox86hwintrinsic

@dotnet-bot test Windows_NT x86 Checked jitincompletehwintrinsic
@dotnet-bot test Windows_NT x86 Checked jitx86hwintrinsicnoavx
@dotnet-bot test Windows_NT x86 Checked jitx86hwintrinsicnoavx2
@dotnet-bot test Windows_NT x86 Checked jitx86hwintrinsicnosimd
@dotnet-bot test Windows_NT x86 Checked jitnox86hwintrinsic

@dotnet-bot test Ubuntu x64 Checked jitincompletehwintrinsic
@dotnet-bot test Ubuntu x64 Checked jitx86hwintrinsicnoavx
@dotnet-bot test Ubuntu x64 Checked jitx86hwintrinsicnoavx2
@dotnet-bot test Ubuntu x64 Checked jitx86hwintrinsicnosimd
@dotnet-bot test Ubuntu x64 Checked jitnox86hwintrinsic

@fiigii
Copy link
Author

fiigii commented Sep 4, 2018

I would still like to see test coverage for the indirect invocation case,

@tannergooding I have added indirect invocation cases, can we merge this PR?

@tannergooding
Copy link
Member

I'm fine with merging. I would appreciate it if you could log a bug tracking the test code cleanup (porting it to a template, etc).

@CarolEidt, any other feedback on your end?

@fiigii
Copy link
Author

fiigii commented Sep 4, 2018

@tannergooding thanks, will do.

@fiigii
Copy link
Author

fiigii commented Sep 4, 2018

Logged at https://github.com/dotnet/coreclr/issues/19825

@CarolEidt
Copy link

No additional feedback - I'm happy with merging

@tannergooding
Copy link
Member

Will merge after the last test finishes (it had timed out previously, so I restarted it).

@fiigii
Copy link
Author

fiigii commented Sep 4, 2018

@tannergooding @CarolEidt Thank you so much for the review!

@fiigii
Copy link
Author

fiigii commented Sep 5, 2018

@tannergooding could you please merge this PR?

@tannergooding tannergooding merged commit 0f6597f into dotnet:master Sep 5, 2018
@tannergooding
Copy link
Member

Done. Thanks for the work @fiigii

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants