Improve Math.BigMul performance on x64 #117261

Daniel-Svensson · 2025-07-03T09:36:39Z

This #115966 but without the mulx support.
See the old PR for benchmarks results and generated code

The reason for opening a separate PR is

Rerun all tests without the mulx code to increase confident that it works on non-AVX2 hardware
Make it easy for the team pick only this specific part of the PR (in case it makes review, benchmarks follow up or similar easier)

Feel free to close it if you prefer to work with the original PR

Copilot

Pull Request Overview

This PR backports the Math.BigMul hardware intrinsic support on x64 without the mulx instruction extension. It adds new BigMul overloads in the X86Base APIs, hooks them into Math.BigMul, and extends the JIT to lower, schedule, and codegen these multi-register intrinsics.

Introduces BigMul methods for 32-, 64-, and pointer-size integers in X86Base and their platform-not-supported stubs.
Updates Math.BigMul to prefer the new X86Base.X64.BigMul path on non-MONO x64, falling back as before.
Enhances JIT (linear scan, lowering, import, list, codegen, tree layout) to recognize and generate BigMul intrinsics.

Reviewed Changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
src/System.Private.CoreLib/src/System/Runtime/Intrinsics/X86/X86Base.cs	Added `BigMul` intrinsics for various operand widths
src/System.Private.CoreLib/src/System/Runtime/Intrinsics/X86/X86Base.PlatformNotSupported.cs	Added `BigMul` stubs throwing on unsupported platforms
src/System.Private.CoreLib/src/System/Math.cs	Routed `Math.BigMul` to use the new intrinsics on x64
src/coreclr/jit/lsraxarch.cpp	Updated register allocator for `BigMul` multi-reg defs
src/coreclr/jit/lowerxarch.cpp	Enabled containment checks for `BigMul`
src/coreclr/jit/hwintrinsicxarch.cpp	Imported `BigMul` as a multi-register HW intrinsic
src/coreclr/jit/hwintrinsiclistxarch.h	Listed `BigMul` in the x86 and x64 HW intrinsic tables
src/coreclr/jit/hwintrinsiccodegenxarch.cpp	Emitting `MUL/IMUL` sequence for `BigMul`
src/coreclr/jit/hwintrinsic.h	Updated multi-reg return count for `BigMul`
src/coreclr/jit/gentree.cpp	Defined struct layout for `BigMul` return

Comments suppressed due to low confidence (3)

src/libraries/System.Private.CoreLib/src/System/Math.cs:205

Consider adding unit tests that validate the new Math.BigMul(ulong, ulong, out ulong) path on x64 and ensure correct behavior when X86Base.X64.IsSupported is true/false.

#if !MONO // X64.BigMul is not yet implemented in MONO

JulieLeeMSFT · 2025-07-28T13:48:33Z

@EgorBo, PTAL.

EgorBo · 2025-07-31T14:26:07Z

src/coreclr/jit/hwintrinsiccodegenxarch.cpp

+
+            if (rmOp->isUsedFromReg() && rmOp->GetRegNum() == REG_EAX)
+            {
+                std::swap(rmOp, regOp);


Could you please explain this swap to me?

In case op2 (rmop) is already present in RAX we use op2 as implicit operand.
Otherwise we would overwrite that value.

I should probably add a comment similar to the one I added for codeGenMulHi

runtime/src/coreclr/jit/codegenxarch.cpp

Lines 868 to 871 in 51a4123

// If op2 is already present in RAX use that as implicit operand

if (rmOp->isUsedFromReg() && (rmOp->GetRegNum() == REG_RAX))

{

std::swap(regOp, rmOp);

Do you want me to add the comment to this PR or #115966 ?

I'm fine with either option, I think the PR is not too big to split it?

I think the PR is not too big to split it?

I've updated both PRs.
The 2 reasons to open this as a separate PR was to

Show that it works for the non-AVX2 path and passes all tests

For the mulx path I am a bit unsure about the best way to handle the allocation of the RDX register the MULX code produced worse code than MUL for at least one method (you can expand comment to se asm code) . Either approach is better than what's in main but I wanted to avoid adding a "mulx" optimization that gives worse code on average

So this PR is the same as #115966 but without the BMI2 path? Does it have any potential performance impact on BMI2 CPUs or diffs? could you please resolve the conflicts so we can run the diffs (or close one of the PRs)

So this PR is the same as #115966 but without the BMI2 path?

Yes that is correct

Does it have any potential performance impact on BMI2 CPUs or diffs?

It might be expand "generated code" for edx ret comment for example diff and some thoughts

JulieLeeMSFT · 2025-11-03T17:18:06Z

Rerunning failed test.

Add X86Base.BigMul and call it from Math.BigMul

d781b3a

Copilot AI review requested due to automatic review settings July 3, 2025 09:36

github-actions bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jul 3, 2025

dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Jul 3, 2025

Copilot AI reviewed Jul 3, 2025

View reviewed changes

This was referenced Jul 3, 2025

browser-wasm Windows build error #116746

Closed

wasm build failure in CI #117017

Open

Daniel-Svensson mentioned this pull request Jul 3, 2025

Improve Math.BigMul on x64 by adding new internal Multiply hardware intrinsic to X86Base #115966

Open

JulieLeeMSFT requested a review from EgorBo July 28, 2025 13:48

EgorBo reviewed Jul 31, 2025

View reviewed changes

Daniel-Svensson added 2 commits August 4, 2025 20:34

merge upstream/main

90d2934

add comment regarding swap

e110d32

build-analysis bot mentioned this pull request Aug 4, 2025

System.Data.OleDb.Tests timeout in net48 x86 Release leg #87783

Closed

Merge branch 'main' into x86_bigmul

69494cb

build-analysis bot mentioned this pull request Oct 11, 2025

Assert Interop.Sys.LSeek(this, 0, Interop.Sys.SeekWhence.SEEK_CUR) >= 0 in System.Net.XXX tests #120577

Closed

JulieLeeMSFT added this to the 11.0.0 milestone Oct 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve Math.BigMul performance on x64 #117261

Improve Math.BigMul performance on x64 #117261

Uh oh!

Daniel-Svensson commented Jul 3, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

JulieLeeMSFT commented Jul 28, 2025

Uh oh!

EgorBo Jul 31, 2025

Uh oh!

Daniel-Svensson Jul 31, 2025

Uh oh!

EgorBo Jul 31, 2025

Uh oh!

Daniel-Svensson Aug 4, 2025

Uh oh!

EgorBo Sep 15, 2025

Uh oh!

Daniel-Svensson Oct 11, 2025

Uh oh!

JulieLeeMSFT commented Nov 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	// If op2 is already present in RAX use that as implicit operand
	if (rmOp->isUsedFromReg() && (rmOp->GetRegNum() == REG_RAX))
	{
	std::swap(regOp, rmOp);

Improve Math.BigMul performance on x64 #117261

Are you sure you want to change the base?

Improve Math.BigMul performance on x64 #117261

Uh oh!

Conversation

Daniel-Svensson commented Jul 3, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

JulieLeeMSFT commented Jul 28, 2025

Uh oh!

EgorBo Jul 31, 2025

Choose a reason for hiding this comment

Uh oh!

Daniel-Svensson Jul 31, 2025

Choose a reason for hiding this comment

Uh oh!

EgorBo Jul 31, 2025

Choose a reason for hiding this comment

Uh oh!

Daniel-Svensson Aug 4, 2025

Choose a reason for hiding this comment

Uh oh!

EgorBo Sep 15, 2025

Choose a reason for hiding this comment

Uh oh!

Daniel-Svensson Oct 11, 2025

Choose a reason for hiding this comment

Uh oh!

JulieLeeMSFT commented Nov 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants