Skip to content

JIT: Accelerate long -> floating casts on x86 #113930

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

saucecontrol
Copy link
Member

@saucecontrol saucecontrol commented Mar 26, 2025

This adds support for using EVEX SIMD conversion instructions to handle long/ulong to float/double casts on x86 rather than going through helper calls. With this, all integral -> floating casts are accelerated on x86 with AVX-512.

Typical diff:

 G_M57389_IG01:        ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref, nogc <-- Prolog IG
-       sub      esp, 8
-       vzeroupper 
-						;; size=6 bbWeight=1 PerfScore 1.25
+						;; size=0 bbWeight=1 PerfScore 0.00
 G_M57389_IG02:        ; bbWeight=1, gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref
-       push     dword ptr [esp+0x10+0x04]
-       ; npt arg push 0
-       push     dword ptr [esp+0x0C+0x08]
-       ; npt arg push 1
-       call     CORINFO_HELP_LNG2DBL
-       ; gcr arg pop 2
-       fstp     qword ptr [esp]
-       vmovsd   xmm0, qword ptr [esp]
-       vcvtsd2ss xmm0, xmm0, xmm0
+       vmovq    xmm0, qword ptr [esp+0x04]
+       vcvtqq2ps xmm0, xmm0
        vmovd    eax, xmm0
-						;; size=29 bbWeight=1 PerfScore 12.50
+						;; size=16 bbWeight=1 PerfScore 9.00
 G_M57389_IG03:        ; bbWeight=1, epilog, nogc, extend
-       add      esp, 8
        ret      8
-						;; size=6 bbWeight=1 PerfScore 2.25
+						;; size=3 bbWeight=1 PerfScore 2.00

Full diffs

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Mar 26, 2025
@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Mar 26, 2025
Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

@saucecontrol saucecontrol changed the title JIT: Accelerated long -> floating casts on x86 JIT: Accelerate long -> floating casts on x86 Mar 26, 2025
@saucecontrol
Copy link
Member Author

saucecontrol commented Mar 28, 2025

This is ready for review. To summarize, it simply replaces helper calls with AVX-512 conversion instructions, as follows:

  • long->double: vcvtqq2pd
  • long->float: vcvtqq2ps
  • ulong->double: vcvtuqq2pd
  • ulong->float: vcvtuqq2pd+vcvtsd2ss (double conversion preserves existing behavior)

Latest diffs show no regressions other than those expected from inlining.

@@ -137,7 +137,12 @@ GenTree* DecomposeLongs::DecomposeNode(GenTree* tree)
}
}

#if defined(FEATURE_HW_INTRINSICS) && defined(TARGET_X86)
if (!tree->TypeIs(TYP_LONG) &&
!(tree->OperIs(GT_CAST) && varTypeIsLong(tree->AsCast()->CastOp()) && varTypeIsFloating(tree)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: negated conditions like this can be hard to read. A small comment covering that we want to handle nodes that produce long or GT_CAST float->long would be beneficial IMO.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I actually plan on extending this to handle casts in the opposite direction as well, and that will make this check even more hairy. I'll do something to simplify it then.

if (!tree->TypeIs(TYP_LONG))
#endif // FEATURE_HW_INTRINSICS && TARGET_X86
{
return tree->gtNext;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fact that from this point onwards it can now also be GT_CAST float rather than only some NODE long seems like a tricky thing that might trip people up in the future.

Comment on lines +600 to +610
if (m_compiler->compOpportunisticallyDependsOn(InstructionSet_AVX512DQ_VL))
{
intrinsicId = (dstType == TYP_FLOAT) ? NI_AVX512DQ_VL_ConvertToVector128Single
: NI_AVX512DQ_VL_ConvertToVector128Double;
}
else
{
assert(m_compiler->compIsaSupportedDebugOnly(InstructionSet_AVX10v1));
intrinsicId =
(dstType == TYP_FLOAT) ? NI_AVX10v1_ConvertToVector128Single : NI_AVX10v1_ConvertToVector128Double;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These checks feel like they should be inverted since AVX10v1 is newer, so we should opportunistically check for AVX10v1 and assume otherwise AVX512DQ.VL is supported. This is particularly relevant since the spec for AVX10 is changing to require V512 support.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, with the change to require V512, I think we don't even need the AVX10v1 path anymore, technically. As AVX512DQ.VL will always be available if AVX10v1 is available

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is copying the pattern we already use in a lot of other places, but yeah, we can simplify all of them now, I think.

Copy link
Member

@tannergooding tannergooding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes LGTM. Just a couple minor bits of feedback

CC. @dotnet/jit-contrib, @EgorBo for secondary review

@tannergooding
Copy link
Member

Ping @dotnet/jit-contrib, @EgorBo for secondary review

@BruceForstall
Copy link
Member

/azp run runtime-coreclr outerloop, Fuzzlyn

Copy link

Azure Pipelines successfully started running 2 pipeline(s).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants