-
Notifications
You must be signed in to change notification settings - Fork 5k
JIT: Accelerate long -> floating casts on x86 #113930
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
src/tests/JIT/Regression/JitBlue/Runtime_106338/Runtime_106338.cs
Outdated
Show resolved
Hide resolved
This is ready for review. To summarize, it simply replaces helper calls with AVX-512 conversion instructions, as follows:
Latest diffs show no regressions other than those expected from inlining. |
@@ -137,7 +137,12 @@ GenTree* DecomposeLongs::DecomposeNode(GenTree* tree) | |||
} | |||
} | |||
|
|||
#if defined(FEATURE_HW_INTRINSICS) && defined(TARGET_X86) | |||
if (!tree->TypeIs(TYP_LONG) && | |||
!(tree->OperIs(GT_CAST) && varTypeIsLong(tree->AsCast()->CastOp()) && varTypeIsFloating(tree))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: negated conditions like this can be hard to read. A small comment covering that we want to handle nodes that produce long
or GT_CAST float->long
would be beneficial IMO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. I actually plan on extending this to handle casts in the opposite direction as well, and that will make this check even more hairy. I'll do something to simplify it then.
if (!tree->TypeIs(TYP_LONG)) | ||
#endif // FEATURE_HW_INTRINSICS && TARGET_X86 | ||
{ | ||
return tree->gtNext; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fact that from this point onwards it can now also be GT_CAST float
rather than only some NODE long
seems like a tricky thing that might trip people up in the future.
if (m_compiler->compOpportunisticallyDependsOn(InstructionSet_AVX512DQ_VL)) | ||
{ | ||
intrinsicId = (dstType == TYP_FLOAT) ? NI_AVX512DQ_VL_ConvertToVector128Single | ||
: NI_AVX512DQ_VL_ConvertToVector128Double; | ||
} | ||
else | ||
{ | ||
assert(m_compiler->compIsaSupportedDebugOnly(InstructionSet_AVX10v1)); | ||
intrinsicId = | ||
(dstType == TYP_FLOAT) ? NI_AVX10v1_ConvertToVector128Single : NI_AVX10v1_ConvertToVector128Double; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These checks feel like they should be inverted since AVX10v1
is newer, so we should opportunistically check for AVX10v1
and assume otherwise AVX512DQ.VL
is supported. This is particularly relevant since the spec for AVX10 is changing to require V512 support.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, with the change to require V512, I think we don't even need the AVX10v1 path anymore, technically. As AVX512DQ.VL will always be available if AVX10v1 is available
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CC. @anthonycanino
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is copying the pattern we already use in a lot of other places, but yeah, we can simplify all of them now, I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes LGTM. Just a couple minor bits of feedback
CC. @dotnet/jit-contrib, @EgorBo for secondary review
Ping @dotnet/jit-contrib, @EgorBo for secondary review |
/azp run runtime-coreclr outerloop, Fuzzlyn |
Azure Pipelines successfully started running 2 pipeline(s). |
This adds support for using EVEX SIMD conversion instructions to handle long/ulong to float/double casts on x86 rather than going through helper calls. With this, all integral -> floating casts are accelerated on x86 with AVX-512.
Typical diff:
Full diffs