Skip to content
This repository was archived by the owner on Jan 23, 2023. It is now read-only.

Enable SIMD for RyuJIT/x86 #8644

Merged
merged 1 commit into from
Feb 6, 2017

Conversation

BruceForstall
Copy link

@BruceForstall BruceForstall commented Dec 15, 2016

This change implements support for Vector<long>, handling
SIMDIntrinsicInit, which takes a LONG, and decomposition of
SIMDIntrinsicGetItem, which produces a LONG.

It also enables SIMD, including AVX, by default for RyuJIT/x86.

@BruceForstall
Copy link
Author

@sivarv @CarolEidt @pgavlin PTAL
cc @dotnet/jit-contrib

src/jit/lsra.cpp Outdated
@@ -8616,7 +8616,15 @@ void LinearScan::resolveRegisters()
printf("-----------------------\n");

printf("Resolution Candidates: ");
dumpConvertedVarSet(compiler, resolutionCandidateVars);
if (compiler->lvaCount == 0)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this still needed after #8840?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume not. I'll have to pull and rebase.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It shouldn't be needed.

@BruceForstall BruceForstall force-pushed the SimdLongDecomp branch 2 times, most recently from c15918e to 6216fc4 Compare January 18, 2017 21:42
@BruceForstall BruceForstall force-pushed the SimdLongDecomp branch 2 times, most recently from e9a7e57 to 817ec7f Compare February 1, 2017 00:15
@BruceForstall
Copy link
Author

This is ready to be reviewed again. I've run 44 stress flavors of the SIMD tests locally and those are clean.

@sivarv @CarolEidt PTAL
cc @dotnet/jit-contrib

else if ((op1->OperGet() == GT_OBJ) && (op1->gtOp.gtOp1->OperGet() == GT_ADDR) &&
op1->gtOp.gtOp1->gtOp.gtOp1->OperIsLocal())
{
setLclRelatedToSIMDIntrinsic(op1->gtOp.gtOp1->gtOp.gtOp1);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this logic is repeated 3 times. How about abstracting it in a routine SetLclRelatedToSMDIntrinsic()?

The caller would have to call

SetLclRelatedToSMDIntrinsic(op1);
if (op2 != nullptr)
{
SetLclRelatedToSMDIntrinsic(op2);
}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -16885,21 +16884,28 @@ GenTreeSIMD* Compiler::gtNewSIMDNode(
GenTreeSIMD* Compiler::gtNewSIMDNode(
var_types type, GenTreePtr op1, GenTreePtr op2, SIMDIntrinsicID simdIntrinsicID, var_types baseType, unsigned size)
{
// TODO-CQ: An operand may be a GT_OBJ(GT_ADDR(GT_LCL_VAR))), in which case it should be
// marked lvUsedInSIMDIntrinsic.
assert(op1 != nullptr);
if (op1->OperIsLocal())
{
Copy link
Member

@sivarv sivarv Feb 1, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On line 16871 we are checking whether op1->OperGet() == GT_LCL_VAR whereas here using OperIsLocal(). I think on line 16871 too we should use op1->OperIsLocal()?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't really want to touch that if there was a reason, but with the refactoring you suggested above, I did it.

lvaTable[shadowVar].lvRegStruct = varDsc->lvRegStruct;
lvaTable[shadowVar].lvExactSize = varDsc->lvExactSize;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lvaTable[shadowVar].lvExactSize = varDsc->lvExactSize; [](start = 7, length = 55)

This is already under above under #ifdef FEATURE_SIMD.

In fact initializing lvExactSize for shadow vars other than those known to be SIMD has caused a gc-hole.

Why do we need this change, btw?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that Pat added the line you refer to long after I added mine (when I started working on this), so it's probably not needed now.

info->internalFloatCount = 1;
info->setInternalCandidates(lsra, lsra->allSIMDRegs());
info->isInternalRegDelayFree = true;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

else { info->srcCount = 0 } ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

else if (tree->OperGet() == GT_SIMD)
{
GenTreeSIMD* simdNode = tree->AsSIMD();
var_types baseType = simdNode->gtSIMDBaseType;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

var_types baseType = simdNode->gtSIMDBaseType; [](start = 8, length = 49)

Minor: this variable definition could be moved closer to the point of its use in the block starting at line 4518

{
assert(baseSize == 2);
ZeroOrSignExtnReqd = (baseType == TYP_SHORT);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be a good idea to add comments to explain the rationale behind this logic.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

case SIMDIntrinsicInit:
case SIMDIntrinsicInitN:
case SIMDIntrinsicInitArray:
NYI("SIMDIntrinsicInit*");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NYI("SIMDIntrinsicInit*"); [](start = 12, length = 26)

You might want to file a tracking git issue to address this NYI later

// return: GT_LONG(lowResult, highResult)
//
// This isn't optimal codegen, since SIMDIntrinsicGetItem sometimes requires
// temps that could be shared, for example.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is incomplete?

LIR::Use op2(Range(), &simdTree->gtOp.gtOp2, simdTree);
unsigned indexTmpVarNum = op2.ReplaceWithLclVar(m_compiler, m_blockWeight);
JITDUMP("[DecomposeSimdGetItem]: Saving op2 tree to a temp var:\n");
DISPTREERANGE(Range(), op2.Def());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the index is known to be a constant, we can avoid introducing temps.

If we intend to address it in a separate PR, please file a tracking issue and a TODO comment here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@sivarv
Copy link
Member

sivarv commented Feb 1, 2017

:shipit:

Copy link

@CarolEidt CarolEidt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with some (minor, optional) suggestions.

@@ -562,13 +562,13 @@ CONFIG_DWORD_INFO_EX(INTERNAL_JitLoopHoistStats, W("JitLoopHoistStats"), 0, "Dis
CONFIG_DWORD_INFO_EX(INTERNAL_JitDebugLogLoopCloning, W("JitDebugLogLoopCloning"), 0, "In debug builds log places where loop cloning optimizations are performed on the fast path.", CLRConfig::REGUTIL_default);
CONFIG_DWORD_INFO_EX(INTERNAL_JitVNMapSelLimit, W("JitVNMapSelLimit"), 0, "If non-zero, assert if # of VNF_MapSelect applications considered reaches this", CLRConfig::REGUTIL_default)
RETAIL_CONFIG_DWORD_INFO(INTERNAL_JitVNMapSelBudget, W("JitVNMapSelBudget"), 100, "Max # of MapSelect's considered for a particular top-level invocation.")
#if defined(_TARGET_AMD64_)
#if defined(_TARGET_AMD64_) || defined(_TARGET_X86_)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why wouldn't we use TARGET_XARCH here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The JIT defines _TARGET_XARCH_ but the VM does not.

case SIMDIntrinsicInit:
case SIMDIntrinsicInitN:
case SIMDIntrinsicInitArray:
NYI("SIMDIntrinsicInit*");

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this intended to be a temporary NYI? Perhaps you could open an issue for it (if you haven't already).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, it appears that Init is handled without decomposition. Shouldn't this just default to the noway_assert?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the NYI. At first I thought these would need decomp, but they don't return LONG values, so don't need it.

//
// This isn't optimal codegen, since SIMDIntrinsicGetItem sometimes requires
// temps that could be shared, for example.
//

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another possible issue to open?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, I could open a general, "evaluate x86 simd CQ" item, and/or one specific to long get[i].


GenTree* simdTmpVar1 = m_compiler->gtNewLclLNode(simdTmpVarNum, simdTree->gtOp.gtOp1->gtType);
GenTree* indexTmpVar1 = m_compiler->gtNewLclLNode(indexTmpVarNum, TYP_INT);
GenTree* two1 = m_compiler->gtNewIconNode(2, TYP_INT);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor suggestion: A comment here would be nice. It took me a moment to recall why we would need a constant 2.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -26,7 +26,7 @@ public static bool CheckValue<T>(T value, T expectedValue)
}
if (returnVal == false)
{
Console.WriteLine("CheckValue failed for " + expectedValue + " of type " + typeof(T).ToString());
Console.WriteLine("CheckValue failed for type " + typeof(T).ToString() + ". Expected: {0} (0x{0:X}), Got: {1} (0x{1:X})", expectedValue, value);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. thanks.

This change implements support for Vector<long>, handling
SIMDIntrinsicInit, which takes a LONG, and decomposition of
SIMDIntrinsicGetItem, which produces a LONG.

It also enables SIMD, including AVX, by default for RyuJIT/x86.
@BruceForstall
Copy link
Author

@dotnet-bot test Windows_NT x86 Checked jitstress1
@dotnet-bot test Windows_NT x86 Checked jitstress2_jitstressregs0x10
@dotnet-bot test Windows_NT x86 Checked jitstress2_jitstressregs0x80
@dotnet-bot test Windows_NT x86 Checked jitstress2_jitstressregs1
@dotnet-bot test Windows_NT x86 Checked jitstress2_jitstressregs2
@dotnet-bot test Windows_NT x86 Checked jitstress2_jitstressregs3
@dotnet-bot test Windows_NT x86 Checked jitstress2_jitstressregs4
@dotnet-bot test Windows_NT x86 Checked jitstress2_jitstressregs8
@dotnet-bot test Windows_NT x86 Checked jitstress2
@dotnet-bot test Windows_NT x86 Checked jitstressregs0x10
@dotnet-bot test Windows_NT x86 Checked jitstressregs0x80
@dotnet-bot test Windows_NT x86 Checked jitstressregs1
@dotnet-bot test Windows_NT x86 Checked jitstressregs2
@dotnet-bot test Windows_NT x86 Checked jitstressregs3
@dotnet-bot test Windows_NT x86 Checked jitstressregs4
@dotnet-bot test Windows_NT x86 Checked jitstressregs8
@dotnet-bot test Windows_NT x86 Checked minopts
@dotnet-bot test Windows_NT x86 Checked zapdisable

@BruceForstall BruceForstall merged commit d05009a into dotnet:master Feb 6, 2017
@BruceForstall BruceForstall deleted the SimdLongDecomp branch February 6, 2017 19:26
@karelz karelz modified the milestone: 2.0.0 Aug 28, 2017
picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants