Skip to content

Conversation

@saucecontrol
Copy link
Member

@saucecontrol saucecontrol commented Jul 12, 2025

This changes the lowering of floating->integral casts to always replace the cast node with HWIntrinsics rather than doing fixups ahead of the cast and leaving the node in place as a self-cast or letting it be handled in codegen. Since the self-cast was not always eliminated in codegen, this results in some size and throughput improvements.

Because the cast is always replaced now, genFloatToIntCast is no longer necessary on xarch.

This is best viewed with whitespace ignored. Most of the changes are simply an extra level of indentation for the pre-AVX10.2 code in LowerCast.

Diffs

@github-actions github-actions bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jul 12, 2025
@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Jul 12, 2025
@dotnet-policy-service
Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

@saucecontrol saucecontrol marked this pull request as ready for review July 13, 2025 00:47
@saucecontrol
Copy link
Member Author

cc @tannergooding

This is more peeled from #116805, with feedback addressed

Copy link
Member

@tannergooding tannergooding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CC. @dotnet/jit-contrib, @EgorBo for secondary sign-off

This has a couple correctness fixes in addition to the general codegen improvements, so if we're not comfortable taking the whole thing for .NET 10, we likely still need to peel off the fixes that were called out

@EgorBo EgorBo self-requested a review September 15, 2025 22:16
Copy link
Member

@tannergooding tannergooding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CC. @dotnet/jit-contrib for secondary review

@JulieLeeMSFT
Copy link
Member

@jakobbotsch, PTAL.

Copilot AI review requested due to automatic review settings November 17, 2025 21:20
Copilot finished reviewing on behalf of saucecontrol November 17, 2025 21:23
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors the JIT's handling of floating-point to integral casts on xarch by moving all conversion logic to the lowering phase. Previously, some casts were handled in codegen via genFloatToIntCast, but now all floating->integral casts are replaced with HWIntrinsic nodes during lowering. This enables better optimization and eliminates the need for genFloatToIntCast on xarch.

Key changes:

  • Floating->integral casts are now always replaced with HWIntrinsic nodes in LowerCast
  • New AVX10v2 saturating conversion intrinsics are introduced for direct hardware support
  • Fallback paths for pre-AVX10v2 hardware use complex SIMD manipulation for saturation semantics
  • The genFloatToIntCast function is removed from xarch codegen

Reviewed Changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/coreclr/jit/lowerxarch.cpp Implements complete lowering of floating->integral casts with AVX10v2, AVX512, and fallback paths; replaces cast node with HWIntrinsics
src/coreclr/jit/lower.cpp Returns early from LowerNode after cast lowering since the node is now removed
src/coreclr/jit/instr.cpp Removes floating->integral conversion instruction selection since these are now lowered to HWIntrinsics
src/coreclr/jit/hwintrinsiclistxarch.h Adds new AVX10v2 scalar saturating conversion intrinsics and fixes naming consistency
src/coreclr/jit/hwintrinsiccodegenxarch.cpp Adds codegen support for new AVX10v2 saturating conversion intrinsics
src/coreclr/jit/hwintrinsic.cpp Enables AVX10v2_X64 ISA range for 64-bit intrinsics
src/coreclr/jit/gentree.cpp Fixes intrinsic naming from "Truncation" to "Truncated" for consistency
src/coreclr/jit/codegenxarch.cpp Removes obsolete genFloatToIntCast function that is no longer needed
src/coreclr/jit/codegenlinear.cpp Adds xarch guard to make floating->integral casts unreachable on xarch
src/coreclr/jit/CMakeLists.txt Enables SIMD features for i386 Unix builds to support the new lowering approach

GenTree* resultClone = comp->gtClone(result);
castRange.InsertAtEnd(resultClone);

// If the conversion of the fixed-up value overflowed, the result wil be
Copy link

Copilot AI Nov 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo in comment: "wil be" should be "will be".

Suggested change
// If the conversion of the fixed-up value overflowed, the result wil be
// If the conversion of the fixed-up value overflowed, the result will be

Copilot uses AI. Check for mistakes.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll fix this in a follow up unless I need to churn CI on this PR for another reason.

DISPTREERANGE(BlockRange(), tree);
LABELEDDISPTREERANGE("LowerCast after", BlockRange(), castResult);

LIR::Use castUse;
Copy link

Copilot AI Nov 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cast node is being removed from the block range and replaced with castResult, but there's no check to ensure castResult is not null before attempting the replacement. If any of the code paths in the if/else blocks above fail to set castResult, this will result in undefined behavior. Consider adding an assertion assert(castResult != nullptr) before line 1236 to catch potential logic errors.

Suggested change
LIR::Use castUse;
LIR::Use castUse;
assert(castResult != nullptr);

Copilot uses AI. Check for mistakes.
maxIntegralValue = comp->gtNewLconNode(static_cast<int64_t>(UINT64_MAX));
if (srcType == TYP_FLOAT)
{
maxFloatSimdVal->f32[0] = 18446744073709551616.0f;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: A comment explicitly stating what these constants are supposed to be (rounded value of UINT64_MAX? UINT64_MAX + 1?) Would be good so anyone looking at this knows what to do when changing it and can recognize any errors

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's just 264 in float. It actually has two uses:

  1. A float input greater than or equal to that value must be saturated to ulong.MaxValue.
  2. Subtracting that value from a positive input less than ulong.MaxValue wraps it around to a negative value that will convert to signed long with the same bit representation as the correct unsigned result.

There shouldn't be any reason to change any of these constants, but I can add a comment with the intended value if that would help. The actual meaning is explained near the uses.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was just a little surprised that this one for example was 18...616. Isn't UINT64_MAX 18...615? It doesn't need to change if it works though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can’t represent any odd integer over 2^53 as a double, it rounds to the nearest even value.

You can only represent multiples of 4 after 2^54, multiples of 8 after 2^55, and so on

So ulong.MaxValue can’t be represent either, it rounds up to 2^64

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, although the floating point value needs to be the actual power of 2 even when rounding isn't required, as in double->uint32.

e.g. uint32 value 2,500,000,000 has the same binary representation (0x9502F900) as signed int32 -1,794,967,296, which is precisely 2,500,000,000 - 232.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI community-contribution Indicates that the PR has been added by a community member

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants