Skip to content

Optimizes vector conversions with AVX512 #87878

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
7d764be
fixing the JITDbl2Ulng helper function. The new AVX512 instruction vc…
khushal1996 May 9, 2023
f50408b
Making changes to the library test case expected output based on the …
khushal1996 May 10, 2023
f018095
Fixing the JITDbl2Ulng helper function. Also making sure that we are …
khushal1996 May 12, 2023
ffe97cd
reverting jitformat
khushal1996 May 12, 2023
a8ee861
Adding a truncate function to the Dbl2Ulng helper to make sure we avo…
khushal1996 May 15, 2023
bbd8a8b
Adding code to handle vectorized conversion for float/double to/from …
khushal1996 May 16, 2023
a21a077
reverting changes for float to ulong
khushal1996 May 16, 2023
1e3415a
enabling float to ulong conversion
khushal1996 May 16, 2023
c788c67
Making change to set w1 bit for evex
khushal1996 May 17, 2023
fbb2a90
merging with main. Picking up hwintrinsiclistxarh from main
khushal1996 May 18, 2023
9fece01
jit format
khushal1996 May 18, 2023
b40cd8e
Splitting vcvttss2usi to vcvttss2usi32 and vcvttss2usi64. Also adding…
khushal1996 May 18, 2023
710026e
undoing jitformat changes due to merge error
khushal1996 May 18, 2023
75e6acf
removing unused code and correcting throughput and latency informatio…
khushal1996 May 19, 2023
e15be4b
correcting throughput and latency for vcvttss2usi32 and placing it wi…
khushal1996 May 19, 2023
10e2876
formatting
khushal1996 May 19, 2023
9463173
formatting
khushal1996 May 19, 2023
4f7bb67
updating comments
khushal1996 May 22, 2023
a99725c
updating code for github comments. Using compIsaSupportedDebugOnly fo…
khushal1996 May 24, 2023
44390b2
reverting to original checks for ISA supported Debug only because the…
khushal1996 May 24, 2023
2f20ef3
running jitformat
khushal1996 May 24, 2023
b7dff8a
running jitformat
khushal1996 May 25, 2023
9622f78
combine the 2 nodes GT_CAST(GT_CAST(TYP_ULONG, TYP_DOUBLE), TYP_FLOAT…
khushal1996 Jun 17, 2023
d3b542f
merging with main and updating hwintrinsiclistxarch to take into cons…
khushal1996 Jun 18, 2023
8343e18
Changing noway_assert to assert to make sure compOpportunisticallyDep…
khushal1996 Jun 19, 2023
e456763
running jitformat
khushal1996 Jun 19, 2023
0e88650
accelerates ConvertToSingle for uint, ConvertToUInt32 for float, Conv…
khushal1996 Jun 20, 2023
d97a169
Reverting changes for convertToUint32 and also reverting hwintrinlist…
khushal1996 Jun 21, 2023
4e0b663
reverting changes for float/double to uint for scalar values
khushal1996 Jun 21, 2023
cffa6ea
Removing unused code for UINT<->float/double conversions. Cannot supp…
khushal1996 Jun 21, 2023
2697d26
Adding IsBaselineVector512IsaSupportedOpportunistically checks for AV…
khushal1996 Jun 21, 2023
f43f99c
Inserting proper break or return in switch-case for intrinsics
khushal1996 Jun 21, 2023
d51b862
Inserting proper break or return in switch-case for intrinsics
khushal1996 Jun 21, 2023
3f99873
Runnign jitforamt
khushal1996 Jun 21, 2023
12ff62e
moving asserts and taking them out of if checks
khushal1996 Jun 22, 2023
a119b35
jitformat
khushal1996 Jun 22, 2023
47afca2
Changing compOpportunisticallyDependsOn to compIsaSupportedDebugOnly …
khushal1996 Jun 20, 2023
c5c2a44
Making code review changes. Moving around the comOpportunisticallyDep…
khushal1996 Jun 22, 2023
42001ac
removing float to ulong conversion
khushal1996 Jul 6, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 18 additions & 4 deletions src/coreclr/jit/codegenxarch.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -7336,7 +7336,19 @@ void CodeGen::genIntToFloatCast(GenTree* treeNode)
// Also we don't expect to see uint32 -> float/double and uint64 -> float conversions
// here since they should have been lowered appropriately.
noway_assert(srcType != TYP_UINT);
noway_assert((srcType != TYP_ULONG) || (dstType != TYP_FLOAT));
assert((srcType != TYP_ULONG) || (dstType != TYP_FLOAT) ||
compiler->compIsaSupportedDebugOnly(InstructionSet_AVX512F));

if ((srcType == TYP_ULONG) && varTypeIsFloating(dstType) &&
compiler->compOpportunisticallyDependsOn(InstructionSet_AVX512F))
{
assert(compiler->compIsaSupportedDebugOnly(InstructionSet_AVX512F));
genConsumeOperands(treeNode->AsOp());
instruction ins = ins_FloatConv(dstType, srcType, emitTypeSize(srcType));
GetEmitter()->emitInsBinary(ins, emitTypeSize(srcType), treeNode, op1);
genProduceReg(treeNode);
return;
}

// To convert int to a float/double, cvtsi2ss/sd SSE2 instruction is used
// which does a partial write to lower 4/8 bytes of xmm register keeping the other
Expand Down Expand Up @@ -7449,8 +7461,10 @@ void CodeGen::genFloatToIntCast(GenTree* treeNode)
noway_assert((dstSize == EA_ATTR(genTypeSize(TYP_INT))) || (dstSize == EA_ATTR(genTypeSize(TYP_LONG))));

// We shouldn't be seeing uint64 here as it should have been converted
// into a helper call by either front-end or lowering phase.
noway_assert(!varTypeIsUnsigned(dstType) || (dstSize != EA_ATTR(genTypeSize(TYP_LONG))));
// into a helper call by either front-end or lowering phase, unless we have AVX512F
// accelerated conversions.
assert(!varTypeIsUnsigned(dstType) || (dstSize != EA_ATTR(genTypeSize(TYP_LONG))) ||
compiler->compIsaSupportedDebugOnly(InstructionSet_AVX512F));

// If the dstType is TYP_UINT, we have 32-bits to encode the
// float number. Any of 33rd or above bits can be the sign bit.
Expand All @@ -7463,7 +7477,7 @@ void CodeGen::genFloatToIntCast(GenTree* treeNode)
// Note that we need to specify dstType here so that it will determine
// the size of destination integer register and also the rex.w prefix.
genConsumeOperands(treeNode->AsOp());
instruction ins = ins_FloatConv(TYP_INT, srcType, emitTypeSize(srcType));
instruction ins = ins_FloatConv(dstType, srcType, emitTypeSize(srcType));
GetEmitter()->emitInsBinary(ins, emitTypeSize(dstType), treeNode, op1);
genProduceReg(treeNode);
}
Expand Down
5 changes: 5 additions & 0 deletions src/coreclr/jit/emit.h
Original file line number Diff line number Diff line change
Expand Up @@ -3891,6 +3891,11 @@ emitAttr emitter::emitGetMemOpSize(instrDesc* id) const
return EA_32BYTE;
}

case INS_vcvttss2usi64:
{
return EA_4BYTE;
}

case INS_movddup:
{
if (defaultSize == 64)
Expand Down
29 changes: 22 additions & 7 deletions src/coreclr/jit/emitxarch.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1399,7 +1399,6 @@ bool emitter::TakesRexWPrefix(const instrDesc* id) const
case INS_vcvtsd2usi:
case INS_vcvtss2usi:
case INS_vcvttsd2usi:
case INS_vcvttss2usi:
{
if (attr == EA_8BYTE)
{
Expand Down Expand Up @@ -2623,7 +2622,8 @@ bool emitter::emitInsCanOnlyWriteSSE2OrAVXReg(instrDesc* id)
case INS_vcvtsd2usi:
case INS_vcvtss2usi:
case INS_vcvttsd2usi:
case INS_vcvttss2usi:
case INS_vcvttss2usi32:
case INS_vcvttss2usi64:
{
// These SSE instructions write to a general purpose integer register.
return false;
Expand Down Expand Up @@ -11435,12 +11435,18 @@ void emitter::emitDispIns(
case INS_vcvtsd2usi:
case INS_vcvtss2usi:
case INS_vcvttsd2usi:
case INS_vcvttss2usi:
{
printf(" %s, %s", emitRegName(id->idReg1(), attr), emitRegName(id->idReg2(), EA_16BYTE));
break;
}

case INS_vcvttss2usi32:
case INS_vcvttss2usi64:
{
printf(" %s, %s", emitRegName(id->idReg1(), attr), emitRegName(id->idReg2(), EA_4BYTE));
break;
}

#ifdef TARGET_AMD64
case INS_movsxd:
{
Expand Down Expand Up @@ -18595,23 +18601,32 @@ emitter::insExecutionCharacteristics emitter::getInsExecutionCharacteristics(ins
case INS_cvtsi2sd64:
case INS_cvtsi2ss64:
case INS_vcvtsd2usi:
case INS_vcvttsd2usi:
case INS_vcvtusi2sd32:
case INS_vcvtusi2sd64:
case INS_vcvtusi2ss32:
case INS_vcvtusi2ss64:
case INS_vcvttsd2usi:
case INS_vcvttss2usi32:
result.insThroughput = PERFSCORE_THROUGHPUT_1C;
result.insLatency += PERFSCORE_LATENCY_7C;
break;

case INS_vcvtusi2sd64:
case INS_vcvtusi2sd32:
result.insThroughput = PERFSCORE_THROUGHPUT_1C;
result.insLatency += PERFSCORE_LATENCY_5C;
break;

case INS_cvttss2si:
case INS_cvtss2si:
case INS_vcvtss2usi:
case INS_vcvttss2usi:
result.insThroughput = PERFSCORE_THROUGHPUT_1C;
result.insLatency += opSize == EA_8BYTE ? PERFSCORE_LATENCY_8C : PERFSCORE_LATENCY_7C;
break;

case INS_vcvttss2usi64:
result.insThroughput = PERFSCORE_THROUGHPUT_1C;
result.insLatency += PERFSCORE_LATENCY_8C;
break;

case INS_cvtss2sd:
result.insThroughput = PERFSCORE_THROUGHPUT_1C;
result.insLatency += PERFSCORE_LATENCY_5C;
Expand Down
8 changes: 6 additions & 2 deletions src/coreclr/jit/hwintrinsiclistxarch.h
Original file line number Diff line number Diff line change
Expand Up @@ -268,8 +268,12 @@ HARDWARE_INTRINSIC(Vector512, Ceiling,
HARDWARE_INTRINSIC(Vector512, Create, 64, -1, false, {INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid}, HW_Category_Helper, HW_Flag_SpecialImport|HW_Flag_NoCodeGen)
HARDWARE_INTRINSIC(Vector512, CreateScalar, 64, -1, false, {INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid}, HW_Category_Helper, HW_Flag_SpecialImport|HW_Flag_NoCodeGen)
HARDWARE_INTRINSIC(Vector512, CreateScalarUnsafe, 64, 1, true, {INS_movd, INS_movd, INS_movd, INS_movd, INS_movd, INS_movd, INS_movd, INS_movd, INS_movss, INS_movsd_simd}, HW_Category_SIMDScalar, HW_Flag_SpecialImport|HW_Flag_SpecialCodeGen)
HARDWARE_INTRINSIC(Vector512, ConvertToDouble, 64, 1, false, {INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid}, HW_Category_Helper, HW_Flag_SpecialImport|HW_Flag_NoCodeGen|HW_Flag_BaseTypeFromFirstArg)
HARDWARE_INTRINSIC(Vector512, ConvertToSingle, 64, 1, false, {INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid}, HW_Category_Helper, HW_Flag_SpecialImport|HW_Flag_NoCodeGen|HW_Flag_BaseTypeFromFirstArg)
HARDWARE_INTRINSIC(Vector512, ConvertToInt32, 64, 1, false, {INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid}, HW_Category_Helper, HW_Flag_SpecialImport|HW_Flag_NoCodeGen|HW_Flag_BaseTypeFromFirstArg)
HARDWARE_INTRINSIC(Vector512, ConvertToInt64, 64, 1, false, {INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid}, HW_Category_Helper, HW_Flag_SpecialImport|HW_Flag_NoCodeGen|HW_Flag_BaseTypeFromFirstArg)
HARDWARE_INTRINSIC(Vector512, ConvertToUInt32, 64, 1, false, {INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid}, HW_Category_Helper, HW_Flag_SpecialImport|HW_Flag_NoCodeGen|HW_Flag_BaseTypeFromFirstArg)
HARDWARE_INTRINSIC(Vector512, ConvertToUInt64, 64, 1, false, {INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid}, HW_Category_Helper, HW_Flag_SpecialImport|HW_Flag_NoCodeGen|HW_Flag_BaseTypeFromFirstArg)
HARDWARE_INTRINSIC(Vector512, Divide, 64, 2, false, {INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid}, HW_Category_Helper, HW_Flag_SpecialImport|HW_Flag_NoCodeGen)
HARDWARE_INTRINSIC(Vector512, Equals, 64, 2, false, {INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid}, HW_Category_Helper, HW_Flag_SpecialImport|HW_Flag_BaseTypeFromFirstArg|HW_Flag_NoCodeGen)
HARDWARE_INTRINSIC(Vector512, EqualsAll, 64, 2, false, {INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid}, HW_Category_Helper, HW_Flag_SpecialImport|HW_Flag_BaseTypeFromFirstArg|HW_Flag_NoCodeGen)
Expand Down Expand Up @@ -845,7 +849,7 @@ HARDWARE_INTRINSIC(AVX512F, CompareNotEqual,
HARDWARE_INTRINSIC(AVX512F, ConvertScalarToVector128Double, 16, 2, false, {INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_vcvtusi2sd32, INS_invalid, INS_invalid, INS_invalid, INS_invalid}, HW_Category_SIMDScalar, HW_Flag_BaseTypeFromSecondArg|HW_Flag_CopyUpperBits)
HARDWARE_INTRINSIC(AVX512F, ConvertScalarToVector128Single, 16, 2, false, {INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_vcvtusi2ss32, INS_invalid, INS_invalid, INS_invalid, INS_invalid}, HW_Category_SIMDScalar, HW_Flag_BaseTypeFromSecondArg|HW_Flag_CopyUpperBits)
HARDWARE_INTRINSIC(AVX512F, ConvertToUInt32, 16, 1, true, {INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_vcvtss2usi, INS_vcvtsd2usi}, HW_Category_SIMDScalar, HW_Flag_BaseTypeFromFirstArg|HW_Flag_SpecialCodeGen)
HARDWARE_INTRINSIC(AVX512F, ConvertToUInt32WithTruncation, 16, 1, true, {INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_vcvttss2usi, INS_vcvttsd2usi}, HW_Category_SIMDScalar, HW_Flag_BaseTypeFromFirstArg|HW_Flag_SpecialCodeGen)
HARDWARE_INTRINSIC(AVX512F, ConvertToUInt32WithTruncation, 16, 1, true, {INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_vcvttss2usi32, INS_vcvttsd2usi}, HW_Category_SIMDScalar, HW_Flag_BaseTypeFromFirstArg|HW_Flag_SpecialCodeGen)
HARDWARE_INTRINSIC(AVX512F, ConvertToVector128Byte, 64, 1, true, {INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_vpmovdb, INS_vpmovdb, INS_vpmovqb, INS_vpmovqb, INS_invalid, INS_invalid}, HW_Category_SimpleSIMD, HW_Flag_BaseTypeFromFirstArg|HW_Flag_SpecialCodeGen)
HARDWARE_INTRINSIC(AVX512F, ConvertToVector128ByteWithSaturation, 64, 1, true, {INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_vpmovusdb, INS_invalid, INS_vpmovusqb, INS_invalid, INS_invalid}, HW_Category_SimpleSIMD, HW_Flag_BaseTypeFromFirstArg|HW_Flag_SpecialCodeGen)
HARDWARE_INTRINSIC(AVX512F, ConvertToVector128Int16, 64, 1, false, {INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_vpmovqw, INS_vpmovqw, INS_invalid, INS_invalid}, HW_Category_SimpleSIMD, HW_Flag_BaseTypeFromFirstArg|HW_Flag_SpecialCodeGen)
Expand Down Expand Up @@ -1002,7 +1006,7 @@ HARDWARE_INTRINSIC(AVX512F_VL, TernaryLogic,
HARDWARE_INTRINSIC(AVX512F_X64, ConvertScalarToVector128Double, 16, 2, false, {INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_vcvtusi2sd64, INS_invalid, INS_invalid}, HW_Category_SIMDScalar, HW_Flag_BaseTypeFromSecondArg|HW_Flag_CopyUpperBits|HW_Flag_SpecialCodeGen)
HARDWARE_INTRINSIC(AVX512F_X64, ConvertScalarToVector128Single, 16, 2, false, {INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_vcvtusi2ss64, INS_invalid, INS_invalid}, HW_Category_SIMDScalar, HW_Flag_BaseTypeFromSecondArg|HW_Flag_CopyUpperBits|HW_Flag_SpecialCodeGen)
HARDWARE_INTRINSIC(AVX512F_X64, ConvertToUInt64, 16, 1, true, {INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_vcvtss2usi, INS_vcvtsd2usi}, HW_Category_SIMDScalar, HW_Flag_BaseTypeFromFirstArg|HW_Flag_SpecialCodeGen)
HARDWARE_INTRINSIC(AVX512F_X64, ConvertToUInt64WithTruncation, 16, 1, true, {INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_vcvttss2usi, INS_vcvttsd2usi}, HW_Category_SIMDScalar, HW_Flag_BaseTypeFromFirstArg|HW_Flag_SpecialCodeGen)
HARDWARE_INTRINSIC(AVX512F_X64, ConvertToUInt64WithTruncation, 16, 1, true, {INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_vcvttss2usi64, INS_vcvttsd2usi}, HW_Category_SIMDScalar, HW_Flag_BaseTypeFromFirstArg|HW_Flag_SpecialCodeGen)

// ***************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************
// ISA Function name SIMD size NumArg EncodesExtraTypeArg Instructions Category Flags
Expand Down
21 changes: 20 additions & 1 deletion src/coreclr/jit/hwintrinsicxarch.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1364,12 +1364,31 @@ GenTree* Compiler::impSpecialIntrinsic(NamedIntrinsic intrinsic,

case NI_Vector128_ConvertToDouble:
case NI_Vector256_ConvertToDouble:
case NI_Vector512_ConvertToDouble:
{
assert(sig->numArgs == 1);
assert(varTypeIsLong(simdBaseType));
if (IsBaselineVector512IsaSupportedOpportunistically())
{
intrinsic = (simdSize == 16) ? NI_AVX512DQ_VL_ConvertToVector128Double
: (simdSize == 32) ? NI_AVX512DQ_VL_ConvertToVector256Double
: NI_AVX512DQ_ConvertToVector512Double;

op1 = impSIMDPopStack();
retNode = gtNewSimdHWIntrinsicNode(retType, op1, intrinsic, simdBaseJitType, simdSize);
}
break;
}

case NI_Vector128_ConvertToInt64:
case NI_Vector256_ConvertToInt64:
case NI_Vector512_ConvertToInt64:
case NI_Vector128_ConvertToUInt32:
case NI_Vector256_ConvertToUInt32:
case NI_Vector512_ConvertToUInt32:
case NI_Vector128_ConvertToUInt64:
case NI_Vector256_ConvertToUInt64:
case NI_Vector512_ConvertToUInt64:
{
assert(sig->numArgs == 1);
// TODO-XARCH-CQ: These intrinsics should be accelerated
Expand Down Expand Up @@ -1431,7 +1450,7 @@ GenTree* Compiler::impSpecialIntrinsic(NamedIntrinsic intrinsic,
}
else
{
// TODO-XARCH-CQ: These intrinsics should be accelerated
// TODO-XARCH-CQ: These intrinsics should be accelerated.
assert(simdBaseType == TYP_UINT);
}
break;
Expand Down
Loading