-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Updating hardware intrinsics codegen to be table driven #15642
Conversation
FYI. @fiigii, @mikedn, @CarolEidt. This was discussed on a few of the threads. This is similar to the format already used by |
src/jit/compiler.h
Outdated
@@ -3024,6 +3024,8 @@ class Compiler | |||
#if FEATURE_HW_INTRINSICS | |||
InstructionSet lookupHWIntrinsicISA(const char* className); | |||
NamedIntrinsic lookupHWIntrinsic(const char* methodName, InstructionSet isa); | |||
instruction insOfHWIntrinsic(NamedIntrinsic intrinsicID, var_types baseType); | |||
ssize_t ivalOfHWIntrinsic(NamedIntrinsic intrinsicID); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like these 2 functions should be static. Also, it's not clear why they're members of Compiler
. At least insOfHWIntrinsic
should be moved to CodeGen
probably.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Several of these methods could probably be static and could likely be not in the compiler.
I put them here for consistency with the other existing methods that are driven off the HWIntrinsicInfo
table I extended.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like the rest of the methods are related to intrinsic importation so they naturally belong here (for the lack of a better place). Consistency is usually a good thing, when the existing code isn't a mess. Unfortunately for us, the JIT code is a mess and attempting to be consistent with it is not always a useful goal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved them to be static functions defined in hwintrinsiccodegenxarch
.
src/jit/hwintrinsiccodegenxarch.cpp
Outdated
|
||
genConsumeOperands(node); | ||
|
||
switch (intrinsicID) | ||
{ | ||
case NI_SSE2_Add: | ||
{ | ||
op2Reg = op2->gtRegNum; | ||
assert((baseType == TYP_BYTE) || (baseType == TYP_UBYTE) || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This check should be a side effect of insOfHWIntrinsic
. If the type is not supported (e.g. TYP_REF
) insOfHWIntrinsic
should assert. If a specific intrinsic does not support a specific type then the corresponding instruction should be INS_invalid
and that too should assert.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
src/jit/hwintrinsiccodegenxarch.cpp
Outdated
break; | ||
} | ||
ins = compiler->insOfHWIntrinsic(intrinsicID, baseType); | ||
op2Reg = op2->gtRegNum; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like this op2Reg
variable and the corresponding assignment is pointless.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not always - for Sse2.Insert there are 3 arguments Vector128, short/ushort, immediate.
Avx and above use almost always 3 args.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand what that has to do with my comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can change it to pass op2Reg
directly for the binary instructions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
src/jit/hwintrinsicxarch.cpp
Outdated
// Return Value: | ||
// ival for the intrinsic | ||
// | ||
ssize_t Compiler::ivalOfHWIntrinsic(NamedIntrinsic intrinsicID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where's this "ival" thing used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In generating instructions - immediate bytes are a part of the instruction encoding
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect that but I don't see it used anywhere in this PR. It's problematic to review code that it isn't used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are 2, possibly 3, SxS PRs going on that will need to use this method. It exists so that we don't hit merge conflicts.
I can give examples and link to one of the existing PRs if that helps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the main question is if this belongs in Compiler or in CodeGen. I suspect that it will only be needed in CodeGen so...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Working on an update to move it into CodeGen and make it static now.
src/jit/hwintrinsiclistxarch.h
Outdated
// SSE Intrinsics | ||
HARDWARE_INTRINSIC(SSE_IsSupported, "get_IsSupported", SSE) | ||
HARDWARE_INTRINSIC(SSE_Add, "Add", SSE) | ||
// Intrinsic ID Function name ISA Instructions (flt, dbl, i8, i16, i32, i64) ival |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO we should support 128 bit operands (some insertions and extractions work on them) and make a reserved value/values for future use (binary16 et al IEEE 754:2008)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additionally there are conversions between base types: explicit and implicit i.e. unpack../pack.. etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps it could be useful to differentiate between mem/reg/imm operands already here - just a general idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
128-bit operands, if we have a way to emit constants for them (and I'm not sure if we do right now), can be handled based off a single ssize_t variable (all fields are duplicated).
I can clarify, but the instruction encoding is made to be keyed off the baseType, not the targetType.
I don't think differentiating between mem/reg/imm here is needed. The instruction emitted should be the same and it can be handled in codegen.
src/jit/hwintrinsiccodegenxarch.cpp
Outdated
{ | ||
assert(intrinsicID != NI_Illegal); | ||
assert(intrinsicID > NI_HW_INTRINSIC_START && intrinsicID < NI_HW_INTRINSIC_END); | ||
return hwIntrinsicInfoArray[intrinsicID - NI_HW_INTRINSIC_START - 1].ival; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: I don't think we will have any ival
s where the hardcoded value is -1
, so I could probably assert that we are not returning -1
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that all immediate values are 8 bit so it makes sense to use int
-1 as a special value. BTW, it looks like ival
type is ssize_t
now, this seems unnecessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
src/jit/hwintrinsiclistxarch.h
Outdated
|
||
// POPCNT Intrinsics | ||
HARDWARE_INTRINSIC(POPCNT_IsSupported, "get_IsSupported", POPCNT) | ||
HARDWARE_INTRINSIC(POPCNT_PopCount, "PopCount", POPCNT) | ||
HARDWARE_INTRINSIC(POPCNT_IsSupported, "get_IsSupported", POPCNT, {INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid}, -1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the work. I have a incomplete implementation in my local repo that I think it would be better to have a “category” field (e.g., simple intrinsic, helper, memory access, etc.).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the category field useful for? Also, is it in addition to the instruction array?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example, simple intrinsics can be easily table driven, and r64 intrinsics should throw exceptions on x86, and helpers should be manually treated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, so it's moving a bit of the switch statement to also be table driven, so instead of case add, case sub, etc
, it would be case simple_intrin`?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, exactly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it end up being better to have these categorized as Helper
, SIMD_R_R
, SIMD_R_R_R
, SIMD_R_R_R_I
, etc?
There are some that are simple (such as the compare instructions) but that have a different emit path (R_R_R_I instead of R_R_R).
Each emit path will need to have fairly consistent containment analysis done, so it may be a better "categorization"/split for the jump table.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that a category field will be good.
However, after having implemented most of the SSE intrinsics and having started looking at the containment support, I am pretty sure having them categorized by "supported encoding" would be ideal.
So, instructions like "Add, And, AndNot, etc" should be categorized as "R, R/M" instructions. And instructions like movlhps
should be categorized as "R, R" instructions.
With this type of categorization, we can still cut out most of the switch table by using switch (category)
rather than switch (intrinsicID)
. However, we can also have a genHWIntrinsic<category>
method which centralizes the operand handling for a particular category of instructions.
For example, genHWIntrinsicR_RM(instruction ins, GenTree* op1, GenTree* op2)
(used by Add, And, AndNot, etc) will know that op1 cannot be contained, but op2 can optionally be contained and call emitIns_R_R_R
or emitIns_R_R_A
as appropriate. Likewise, genHWIntrinsic_R_R
(used by MoveHighToLow and MoveLowToHigh) will know that neither op can be contained and can skip those checks entirely.
We can also use the category check in lowerxarch to mark nodes as contained or registers as optional.
Thoughts?
I've added the SSE intrinsics to the list as part of this PR. It gives a better example of how this is used across a range of instructions. |
I took @fiigii's suggestion to add a category field and played with it a bit. I have a prototype here: https://github.com/tannergooding/coreclr/commits/hwintrin-category (single commit). I found that having the category based on the method signature and target codegen works well as it allows you to have consistent code paths and operand handling in order to cut down on duplicate code. For example: However,
There might be other opportunities to combine some categories and cut down duplicated code further, but we can come to that as things continue to get implemented. |
I have a branch (https://github.com/tannergooding/coreclr/tree/hwintrin-instrlookup-sse) which merges this PR with #15538 (the https://github.com/tannergooding/coreclr/blob/hwintrin-instrlookup-sse/src/jit/hwintrinsiccodegenxarch.cpp#L173 shows just how much duplicated code gets cut out with the table based approach (and it could probably be improved more). |
@tannergooding Thanks for the work. Our code is slightly different. I will upload my branch once I get back. |
This just updates the HardwareIntrinsicInfo struct to also carry the instruction per base type and an ival per instruction.
This makes it really easy to remove the duplicated code for most basic instructions.
It isn't quite a constant-time lookup, due to multiple overloads mapping to the same intrinsic, but it should still be much improved.