-
Notifications
You must be signed in to change notification settings - Fork 5.2k
JIT: Move remaining xarch floating->integral cast implementation to lowering #117571
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
|
This is more peeled from #116805, with feedback addressed |
tannergooding
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CC. @dotnet/jit-contrib, @EgorBo for secondary sign-off
This has a couple correctness fixes in addition to the general codegen improvements, so if we're not comfortable taking the whole thing for .NET 10, we likely still need to peel off the fixes that were called out
tannergooding
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CC. @dotnet/jit-contrib for secondary review
|
@jakobbotsch, PTAL. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR refactors the JIT's handling of floating-point to integral casts on xarch by moving all conversion logic to the lowering phase. Previously, some casts were handled in codegen via genFloatToIntCast, but now all floating->integral casts are replaced with HWIntrinsic nodes during lowering. This enables better optimization and eliminates the need for genFloatToIntCast on xarch.
Key changes:
- Floating->integral casts are now always replaced with HWIntrinsic nodes in
LowerCast - New AVX10v2 saturating conversion intrinsics are introduced for direct hardware support
- Fallback paths for pre-AVX10v2 hardware use complex SIMD manipulation for saturation semantics
- The
genFloatToIntCastfunction is removed from xarch codegen
Reviewed Changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| src/coreclr/jit/lowerxarch.cpp | Implements complete lowering of floating->integral casts with AVX10v2, AVX512, and fallback paths; replaces cast node with HWIntrinsics |
| src/coreclr/jit/lower.cpp | Returns early from LowerNode after cast lowering since the node is now removed |
| src/coreclr/jit/instr.cpp | Removes floating->integral conversion instruction selection since these are now lowered to HWIntrinsics |
| src/coreclr/jit/hwintrinsiclistxarch.h | Adds new AVX10v2 scalar saturating conversion intrinsics and fixes naming consistency |
| src/coreclr/jit/hwintrinsiccodegenxarch.cpp | Adds codegen support for new AVX10v2 saturating conversion intrinsics |
| src/coreclr/jit/hwintrinsic.cpp | Enables AVX10v2_X64 ISA range for 64-bit intrinsics |
| src/coreclr/jit/gentree.cpp | Fixes intrinsic naming from "Truncation" to "Truncated" for consistency |
| src/coreclr/jit/codegenxarch.cpp | Removes obsolete genFloatToIntCast function that is no longer needed |
| src/coreclr/jit/codegenlinear.cpp | Adds xarch guard to make floating->integral casts unreachable on xarch |
| src/coreclr/jit/CMakeLists.txt | Enables SIMD features for i386 Unix builds to support the new lowering approach |
| GenTree* resultClone = comp->gtClone(result); | ||
| castRange.InsertAtEnd(resultClone); | ||
|
|
||
| // If the conversion of the fixed-up value overflowed, the result wil be |
Copilot
AI
Nov 17, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo in comment: "wil be" should be "will be".
| // If the conversion of the fixed-up value overflowed, the result wil be | |
| // If the conversion of the fixed-up value overflowed, the result will be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll fix this in a follow up unless I need to churn CI on this PR for another reason.
| DISPTREERANGE(BlockRange(), tree); | ||
| LABELEDDISPTREERANGE("LowerCast after", BlockRange(), castResult); | ||
|
|
||
| LIR::Use castUse; |
Copilot
AI
Nov 17, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cast node is being removed from the block range and replaced with castResult, but there's no check to ensure castResult is not null before attempting the replacement. If any of the code paths in the if/else blocks above fail to set castResult, this will result in undefined behavior. Consider adding an assertion assert(castResult != nullptr) before line 1236 to catch potential logic errors.
| LIR::Use castUse; | |
| LIR::Use castUse; | |
| assert(castResult != nullptr); |
| maxIntegralValue = comp->gtNewLconNode(static_cast<int64_t>(UINT64_MAX)); | ||
| if (srcType == TYP_FLOAT) | ||
| { | ||
| maxFloatSimdVal->f32[0] = 18446744073709551616.0f; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: A comment explicitly stating what these constants are supposed to be (rounded value of UINT64_MAX? UINT64_MAX + 1?) Would be good so anyone looking at this knows what to do when changing it and can recognize any errors
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it's just 264 in float. It actually has two uses:
- A float input greater than or equal to that value must be saturated to ulong.MaxValue.
- Subtracting that value from a positive input less than ulong.MaxValue wraps it around to a negative value that will convert to signed long with the same bit representation as the correct unsigned result.
There shouldn't be any reason to change any of these constants, but I can add a comment with the intended value if that would help. The actual meaning is explained near the uses.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was just a little surprised that this one for example was 18...616. Isn't UINT64_MAX 18...615? It doesn't need to change if it works though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can’t represent any odd integer over 2^53 as a double, it rounds to the nearest even value.
You can only represent multiples of 4 after 2^54, multiples of 8 after 2^55, and so on
So ulong.MaxValue can’t be represent either, it rounds up to 2^64
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, although the floating point value needs to be the actual power of 2 even when rounding isn't required, as in double->uint32.
e.g. uint32 value 2,500,000,000 has the same binary representation (0x9502F900) as signed int32 -1,794,967,296, which is precisely 2,500,000,000 - 232.
This changes the lowering of floating->integral casts to always replace the cast node with HWIntrinsics rather than doing fixups ahead of the cast and leaving the node in place as a self-cast or letting it be handled in codegen. Since the self-cast was not always eliminated in codegen, this results in some size and throughput improvements.
Because the cast is always replaced now,
genFloatToIntCastis no longer necessary on xarch.This is best viewed with whitespace ignored. Most of the changes are simply an extra level of indentation for the pre-AVX10.2 code in
LowerCast.Diffs