Skip to content
This repository was archived by the owner on Jan 23, 2023. It is now read-only.

Updating hardware intrinsics codegen to be table driven #15642

Closed
wants to merge 3 commits into from
Closed

Updating hardware intrinsics codegen to be table driven #15642

wants to merge 3 commits into from

Conversation

tannergooding
Copy link
Member

This just updates the HardwareIntrinsicInfo struct to also carry the instruction per base type and an ival per instruction.

This makes it really easy to remove the duplicated code for most basic instructions.

It isn't quite a constant-time lookup, due to multiple overloads mapping to the same intrinsic, but it should still be much improved.

@tannergooding tannergooding requested a review from fiigii December 27, 2017 07:21
@tannergooding
Copy link
Member Author

FYI. @fiigii, @mikedn, @CarolEidt. This was discussed on a few of the threads.

This is similar to the format already used by FEATURE_SIMD.

@@ -3024,6 +3024,8 @@ class Compiler
#if FEATURE_HW_INTRINSICS
InstructionSet lookupHWIntrinsicISA(const char* className);
NamedIntrinsic lookupHWIntrinsic(const char* methodName, InstructionSet isa);
instruction insOfHWIntrinsic(NamedIntrinsic intrinsicID, var_types baseType);
ssize_t ivalOfHWIntrinsic(NamedIntrinsic intrinsicID);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like these 2 functions should be static. Also, it's not clear why they're members of Compiler. At least insOfHWIntrinsic should be moved to CodeGen probably.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Several of these methods could probably be static and could likely be not in the compiler.

I put them here for consistency with the other existing methods that are driven off the HWIntrinsicInfo table I extended.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the rest of the methods are related to intrinsic importation so they naturally belong here (for the lack of a better place). Consistency is usually a good thing, when the existing code isn't a mess. Unfortunately for us, the JIT code is a mess and attempting to be consistent with it is not always a useful goal.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved them to be static functions defined in hwintrinsiccodegenxarch.


genConsumeOperands(node);

switch (intrinsicID)
{
case NI_SSE2_Add:
{
op2Reg = op2->gtRegNum;
assert((baseType == TYP_BYTE) || (baseType == TYP_UBYTE) ||
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check should be a side effect of insOfHWIntrinsic. If the type is not supported (e.g. TYP_REF) insOfHWIntrinsic should assert. If a specific intrinsic does not support a specific type then the corresponding instruction should be INS_invalid and that too should assert.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will change.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

break;
}
ins = compiler->insOfHWIntrinsic(intrinsicID, baseType);
op2Reg = op2->gtRegNum;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like this op2Reg variable and the corresponding assignment is pointless.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not always - for Sse2.Insert there are 3 arguments Vector128, short/ushort, immediate.
Avx and above use almost always 3 args.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what that has to do with my comment.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can change it to pass op2Reg directly for the binary instructions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

// Return Value:
// ival for the intrinsic
//
ssize_t Compiler::ivalOfHWIntrinsic(NamedIntrinsic intrinsicID)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where's this "ival" thing used?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In generating instructions - immediate bytes are a part of the instruction encoding

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect that but I don't see it used anywhere in this PR. It's problematic to review code that it isn't used.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are 2, possibly 3, SxS PRs going on that will need to use this method. It exists so that we don't hit merge conflicts.

I can give examples and link to one of the existing PRs if that helps.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the main question is if this belongs in Compiler or in CodeGen. I suspect that it will only be needed in CodeGen so...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Working on an update to move it into CodeGen and make it static now.

// SSE Intrinsics
HARDWARE_INTRINSIC(SSE_IsSupported, "get_IsSupported", SSE)
HARDWARE_INTRINSIC(SSE_Add, "Add", SSE)
// Intrinsic ID Function name ISA Instructions (flt, dbl, i8, i16, i32, i64) ival

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO we should support 128 bit operands (some insertions and extractions work on them) and make a reserved value/values for future use (binary16 et al IEEE 754:2008)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally there are conversions between base types: explicit and implicit i.e. unpack../pack.. etc.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps it could be useful to differentiate between mem/reg/imm operands already here - just a general idea.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

128-bit operands, if we have a way to emit constants for them (and I'm not sure if we do right now), can be handled based off a single ssize_t variable (all fields are duplicated).

I can clarify, but the instruction encoding is made to be keyed off the baseType, not the targetType.

I don't think differentiating between mem/reg/imm here is needed. The instruction emitted should be the same and it can be handled in codegen.

{
assert(intrinsicID != NI_Illegal);
assert(intrinsicID > NI_HW_INTRINSIC_START && intrinsicID < NI_HW_INTRINSIC_END);
return hwIntrinsicInfoArray[intrinsicID - NI_HW_INTRINSIC_START - 1].ival;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: I don't think we will have any ivals where the hardcoded value is -1, so I could probably assert that we are not returning -1.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that all immediate values are 8 bit so it makes sense to use int -1 as a special value. BTW, it looks like ival type is ssize_t now, this seems unnecessary.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.


// POPCNT Intrinsics
HARDWARE_INTRINSIC(POPCNT_IsSupported, "get_IsSupported", POPCNT)
HARDWARE_INTRINSIC(POPCNT_PopCount, "PopCount", POPCNT)
HARDWARE_INTRINSIC(POPCNT_IsSupported, "get_IsSupported", POPCNT, {INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid}, -1)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the work. I have a incomplete implementation in my local repo that I think it would be better to have a “category” field (e.g., simple intrinsic, helper, memory access, etc.).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the category field useful for? Also, is it in addition to the instruction array?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, simple intrinsics can be easily table driven, and r64 intrinsics should throw exceptions on x86, and helpers should be manually treated.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, so it's moving a bit of the switch statement to also be table driven, so instead of case add, case sub, etc, it would be case simple_intrin`?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, exactly.

Copy link
Member Author

@tannergooding tannergooding Dec 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it end up being better to have these categorized as Helper, SIMD_R_R, SIMD_R_R_R, SIMD_R_R_R_I, etc?

There are some that are simple (such as the compare instructions) but that have a different emit path (R_R_R_I instead of R_R_R).

Each emit path will need to have fairly consistent containment analysis done, so it may be a better "categorization"/split for the jump table.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that a category field will be good.

However, after having implemented most of the SSE intrinsics and having started looking at the containment support, I am pretty sure having them categorized by "supported encoding" would be ideal.

So, instructions like "Add, And, AndNot, etc" should be categorized as "R, R/M" instructions. And instructions like movlhps should be categorized as "R, R" instructions.

With this type of categorization, we can still cut out most of the switch table by using switch (category) rather than switch (intrinsicID). However, we can also have a genHWIntrinsic<category> method which centralizes the operand handling for a particular category of instructions.

For example, genHWIntrinsicR_RM(instruction ins, GenTree* op1, GenTree* op2) (used by Add, And, AndNot, etc) will know that op1 cannot be contained, but op2 can optionally be contained and call emitIns_R_R_R or emitIns_R_R_A as appropriate. Likewise, genHWIntrinsic_R_R (used by MoveHighToLow and MoveLowToHigh) will know that neither op can be contained and can skip those checks entirely.

We can also use the category check in lowerxarch to mark nodes as contained or registers as optional.

Thoughts?

@tannergooding
Copy link
Member Author

I've added the SSE intrinsics to the list as part of this PR. It gives a better example of how this is used across a range of instructions.

@tannergooding
Copy link
Member Author

tannergooding commented Dec 30, 2017

I took @fiigii's suggestion to add a category field and played with it a bit.

I have a prototype here: https://github.com/tannergooding/coreclr/commits/hwintrin-category (single commit).

I found that having the category based on the method signature and target codegen works well as it allows you to have consistent code paths and operand handling in order to cut down on duplicate code.

For example: Sse.Add would be InstructionCategory_V_VM_VM since it operates on packed vector arguments and either the first or second operand can be contained (this is due to the need to generate movaps targetReg, op1 followed by addps targetReg, op2 on machines without VEX support).

However, Avx.Add would be InstructionCategory_V_V_VM since the three operand encoding needs to be used and only the second op and be contained.

Sse.AddScalar is again a different category (InstructionCategory_S_SM_SM) as it allows for different codegen and containment handling since it doesn't require the memory operands to be aligned on non-VEX machines.

There might be other opportunities to combine some categories and cut down duplicated code further, but we can come to that as things continue to get implemented.

@tannergooding
Copy link
Member Author

I have a branch (https://github.com/tannergooding/coreclr/tree/hwintrin-instrlookup-sse) which merges this PR with #15538 (the Implement SSE Hardware intrinsics PR).

https://github.com/tannergooding/coreclr/blob/hwintrin-instrlookup-sse/src/jit/hwintrinsiccodegenxarch.cpp#L173 shows just how much duplicated code gets cut out with the table based approach (and it could probably be improved more).

@fiigii
Copy link

fiigii commented Dec 31, 2017

@tannergooding Thanks for the work. Our code is slightly different. I will upload my branch once I get back.

@tannergooding
Copy link
Member Author

Going to close this. @fiigii is handling it in his PR (#15749)

@tannergooding tannergooding deleted the hwintrin-instrlookup branch January 17, 2018 01:53
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants