Skip to content

Conversation

@lilinus
Copy link
Contributor

@lilinus lilinus commented Apr 11, 2024

For Math.BigMul(ulong a, ulong b, out ulong low): now we can leverage intrinsics
For Math.UInt128 BigMul(UInt128 left, UInt128 right, out UInt128 lower): Don't do 128x128-bit multiplication when we can use 64x64-bit.

Note: maybe there will be performance regressions due to #75594, but in that case it would be better to remove affected intrinsics-usage in the BigMul methods until the underlying issue is resolved.

@ghost ghost added the area-System.Numerics label Apr 11, 2024
@dotnet-policy-service
Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-numerics
See info in area-owners.md if you want to be subscribed.

@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Apr 11, 2024
@jkotas
Copy link
Member

jkotas commented Apr 19, 2024

I assume that this change is expected to be a performance improvement. Is that correct? Could you please share some numbers for how this improves performance of public APIs?

@lilinus
Copy link
Contributor Author

lilinus commented Apr 24, 2024

Sorry for late response, finally got benchmarking working.

Benchmark is for checked multiplication of UInt128 and UInt128 (which uses UInt128.BigMul).

Top one is main branch, not a big improvement it seems...

BenchmarkDotNet v0.13.13-nightly.20240311.145, Windows 10 (10.0.19044.3086/21H2/November2021Update)
AMD Ryzen 9 4900HS with Radeon Graphics, 1 CPU, 16 logical and 8 physical cores
.NET SDK 9.0.100-preview.3.24204.13
  [Host]     : .NET 9.0.0 (9.0.24.17209), X64 RyuJIT AVX2
  Job-FPSCJH : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX2
  Job-PMEPEV : .NET 9.0.0 (42.42.42.42424), X64 RyuJIT AVX2

PowerPlanMode=00000000-0000-0000-0000-000000000000  IterationTime=250ms  MaxIterationCount=20
MinIterationCount=15  WarmupCount=1

| Method           | Job        | Toolchain                                                                                              | val1                 | val2                 | Mean     | Error     | StdDev    | Median   | Min      | Max      | Ratio | Allocated | Alloc Ratio |
|----------------- |----------- |------------------------------------------------------------------------------------------------------- |--------------------- |--------------------- |---------:|----------:|----------:|---------:|---------:|---------:|------:|----------:|------------:|
| Int128MulChecked | Job-FPSCJH | \dnrr\artifacts\bin\testhost\net9.0-windows-Release-x64\shared\Microsoft.NETCore.App\9.0.0\CoreRun.exe | 18446744073709551615 | 18446744073709551615 | 7.679 ns | 0.0089 ns | 0.0070 ns | 7.682 ns | 7.661 ns | 7.686 ns |  1.00 |         - |          NA |
| Int128MulChecked | Job-PMEPEV | \dnrt\artifacts\bin\testhost\net9.0-windows-Release-x64\shared\Microsoft.NETCore.App\9.0.0\CoreRun.exe | 18446744073709551615 | 18446744073709551615 | 7.615 ns | 0.0129 ns | 0.0115 ns | 7.611 ns | 7.598 ns | 7.640 ns |  0.99 |         - |          NA |

@tannergooding
Copy link
Member

Top one is main branch, not a big improvement it seems...

It's to be expected overall. We generally have good handling of promotion for types that only contain two primitive fields and we have code paths that check if upper of both inputs are 0 so we can specialize the codegen down to BigMul already

So this is really just reducing the work the JIT has to do and in turn does a little bit of cleanup around the inlining boundaries where forward sub isn't robust enough today.

It's essentially a simplification of the code path and one that won't change, so it's something I'm happy to take

@tannergooding tannergooding merged commit f453865 into dotnet:main Jun 13, 2024
@github-actions github-actions bot locked and limited conversation to collaborators Jul 13, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

area-System.Numerics community-contribution Indicates that the PR has been added by a community member

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants