Skip to content

Conversation

@Daniel-Svensson
Copy link
Contributor

This #115966 but without the mulx support.
See the old PR for benchmarks results and generated code

The reason for opening a separate PR is

  • Rerun all tests without the mulx code to increase confident that it works on non-AVX2 hardware
  • Make it easy for the team pick only this specific part of the PR (in case it makes review, benchmarks follow up or similar easier)

Feel free to close it if you prefer to work with the original PR

Copilot AI review requested due to automatic review settings July 3, 2025 09:36
@github-actions github-actions bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jul 3, 2025
@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Jul 3, 2025
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR backports the Math.BigMul hardware intrinsic support on x64 without the mulx instruction extension. It adds new BigMul overloads in the X86Base APIs, hooks them into Math.BigMul, and extends the JIT to lower, schedule, and codegen these multi-register intrinsics.

  • Introduces BigMul methods for 32-, 64-, and pointer-size integers in X86Base and their platform-not-supported stubs.
  • Updates Math.BigMul to prefer the new X86Base.X64.BigMul path on non-MONO x64, falling back as before.
  • Enhances JIT (linear scan, lowering, import, list, codegen, tree layout) to recognize and generate BigMul intrinsics.

Reviewed Changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated no comments.

Show a summary per file
File Description
src/System.Private.CoreLib/src/System/Runtime/Intrinsics/X86/X86Base.cs Added BigMul intrinsics for various operand widths
src/System.Private.CoreLib/src/System/Runtime/Intrinsics/X86/X86Base.PlatformNotSupported.cs Added BigMul stubs throwing on unsupported platforms
src/System.Private.CoreLib/src/System/Math.cs Routed Math.BigMul to use the new intrinsics on x64
src/coreclr/jit/lsraxarch.cpp Updated register allocator for BigMul multi-reg defs
src/coreclr/jit/lowerxarch.cpp Enabled containment checks for BigMul
src/coreclr/jit/hwintrinsicxarch.cpp Imported BigMul as a multi-register HW intrinsic
src/coreclr/jit/hwintrinsiclistxarch.h Listed BigMul in the x86 and x64 HW intrinsic tables
src/coreclr/jit/hwintrinsiccodegenxarch.cpp Emitting MUL/IMUL sequence for BigMul
src/coreclr/jit/hwintrinsic.h Updated multi-reg return count for BigMul
src/coreclr/jit/gentree.cpp Defined struct layout for BigMul return
Comments suppressed due to low confidence (3)

src/libraries/System.Private.CoreLib/src/System/Math.cs:205

  • Consider adding unit tests that validate the new Math.BigMul(ulong, ulong, out ulong) path on x64 and ensure correct behavior when X86Base.X64.IsSupported is true/false.
#if !MONO // X64.BigMul is not yet implemented in MONO

@JulieLeeMSFT
Copy link
Member

@EgorBo, PTAL.


if (rmOp->isUsedFromReg() && rmOp->GetRegNum() == REG_EAX)
{
std::swap(rmOp, regOp);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please explain this swap to me?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case op2 (rmop) is already present in RAX we use op2 as implicit operand.
Otherwise we would overwrite that value.

I should probably add a comment similar to the one I added for codeGenMulHi

// If op2 is already present in RAX use that as implicit operand
if (rmOp->isUsedFromReg() && (rmOp->GetRegNum() == REG_RAX))
{
std::swap(regOp, rmOp);

Do you want me to add the comment to this PR or #115966 ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with either option, I think the PR is not too big to split it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the PR is not too big to split it?

I've updated both PRs.
The 2 reasons to open this as a separate PR was to

  1. Show that it works for the non-AVX2 path and passes all tests
  2. For the mulx path I am a bit unsure about the best way to handle the allocation of the RDX register the MULX code produced worse code than MUL for at least one method (you can expand comment to se asm code) . Either approach is better than what's in main but I wanted to avoid adding a "mulx" optimization that gives worse code on average

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this PR is the same as #115966 but without the BMI2 path? Does it have any potential performance impact on BMI2 CPUs or diffs? could you please resolve the conflicts so we can run the diffs (or close one of the PRs)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this PR is the same as #115966 but without the BMI2 path?

Yes that is correct

Does it have any potential performance impact on BMI2 CPUs or diffs?

It might be expand "generated code" for edx ret comment for example diff and some thoughts

@JulieLeeMSFT
Copy link
Member

Rerunning failed test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI community-contribution Indicates that the PR has been added by a community member

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants