Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changes some of the CPU Math implemenation from our current version to use the new TensorPrimitives package. #6875

Merged
merged 27 commits into from
Nov 15, 2023

Conversation

michaelgsharp
Copy link
Member

@michaelgsharp michaelgsharp commented Nov 2, 2023

This changes some of the CPU Math implementation from our current version to use the new TensorPrimitives package.

Currently we are pointing to the rc2 version, but the following benchmarks have been done with a local copy of the GA version.

This also changes CPUMath to target .NET 8 instead of .NET 6. Did we want that for this version? Or should I change it back to 6 for this release? @ericstj @jeffhandley

The following is a summary of the methods in CPUMath, the old vs new benchmarks, and whether I updated it to use the new TensorPrimitives package. @tannergooding @stephentoub @jeffhandley @ericstj @luisquintanilla This is where we need to discuss. Is any performance hit worth taking? Or should anything that is slower be kept on the existing code?

NET 8

Method arrayLength Mean - Original Mean - New % Faster Comments
AddScalarU 512 25.30 ns 20.32 ns 25%
Scale 512 19.91 ns 19.29 ns 3%
ScaleSrcU 512 27.58 ns 20.74 ns 33%
ScaleAddU 512 28.46 ns 29.05 ns Method Unchanged, composite function so slower with new code
AddScaleU 512 29.74 ns 28.59 ns 4%
AddScaleSU 512 345.92 ns 327.68 ns 6% Method Unchanged, dont have Sparse in Tensor Primitives. Can simulate but is slower.
AddScaleCopyU 512 34.01 ns 27.03 ns 26%
AddU 512 29.80 ns 26.71 ns 12%
AddSU 512 325.32 ns 349.46 ns Method Unchanged, dont have Sparse in Tensor Primitives. Can simulate but is slower.
MulElementWiseU 512 33.92 ns 27.29 ns 24%
Sum 512 36.57 ns 34.34 ns 6%
SumSqU 512 37.50 ns 39.34 ns -5%
SumSqDiffU 512 41.23 ns 43.38 ns Method Unchanged, composite function so slower with new code
SumAbsU 512 43.74 ns 39.27 ns 11%
SumAbsDiffU 512 47.23 ns 37.48 ns 26%
MaxAbsU 512 42.30 ns 43.26 ns Method Unchanged, in GA MaxMagnitude is slow, has been fixed for next release
MaxAbsDiffU 512 46.94 ns 47.73 ns Method Unchanged, in GA MaxMagnitude is slow, has been fixed for next release. Is composite function.
DotU 512 50.34 ns 43.20 ns 17%
DotSU 512 212.19 ns 213.18 ns Method Unchanged, dont have Sparse in Tensor Primitives. Can simulate but is slower.
Dist2 512 55.48 ns 47.43 ns 17%

Framework

Method arrayLength Mean - Original Mean - New % Faster Comments
AddScalarU 256 48.48 ns 29.88 ns 62%
Scale 256 43.45 ns 28.55 ns 52%
ScaleSrcU 256 49.87 ns 38.13 ns 31%
ScaleAddU 256 47.87 ns 45.76 ns Method Unchanged, composite function so slower with new code
AddScaleU 256 52.63 ns 62.58 ns -16% Slightly slower in new code. Do we want to keep it?
AddScaleSU 256 151.00 ns 152.77 ns Method Unchanged, dont have Sparse in Tensor Primitives. Can simulate but is slower.
AddScaleCopyU 256 48.35 ns 63.94 ns -24% Slightly slower in new code. Do we want to keep it?
AddU 256 49.68 ns 59.32 ns -16% Slightly slower in new code. Do we want to keep it?
AddSU 256 150.34 ns 153.89 ns Method Unchanged, dont have Sparse in Tensor Primitives. Can simulate but is slower.
MulElementWiseU 256 48.26 ns 69.89 ns -31%
Sum 256 68.05 ns 59.74 ns 14%
SumSqU 256 68.21 ns 62.08 ns 10%
SumSqDiffU 256 57.52 ns 57.64 ns Method Unchanged, composite function so slower with new code
SumAbsU 256 72.88 ns 65.01 ns 12%
SumAbsDiffU 256 59.51 ns 68.23 ns -13% Slightly slower in new code. Do we want to keep it?
MaxAbsU 256 72.26 ns 71.48 ns Method Unchanged, in GA MaxMagnitude is slow, has been fixed for next release
MaxAbsDiffU 256 59.30 ns 58.87 ns Method Unchanged, in GA MaxMagnitude is slow, has been fixed for next release. Is composite function.
DotU 256 58.93 ns 68.42 ns -14% Slightly slower in new code. Do we want to keep it?
DotSU 256 109.76 ns 113.78 ns Method Unchanged, dont have Sparse in Tensor Primitives. Can simulate but is slower.
Dist2 256 59.49 ns 86.97 ns -32% Slightly slower in new code. Do we want to keep it?

I think that even if we don't want to keep the TensorPrimitives code in the cases where its slower, at least for .NET Framework we should add a check and if the native code doesn't exist to run these accelerated, we should fallback to the TensorPrimitives approach. That would have to be added in though.

All this was done with AVX256.

Copy link

codecov bot commented Nov 14, 2023

Codecov Report

Merging #6875 (54e876a) into main (796cb35) will decrease coverage by 0.60%.
Report is 1 commits behind head on main.
The diff coverage is 100.00%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #6875      +/-   ##
==========================================
- Coverage   69.40%   68.80%   -0.60%     
==========================================
  Files        1238     1240       +2     
  Lines      249462   249392      -70     
  Branches    25522    25493      -29     
==========================================
- Hits       173139   171599    -1540     
- Misses      69578    71196    +1618     
+ Partials     6745     6597     -148     
Flag Coverage Δ
Debug 68.80% <100.00%> (-0.60%) ⬇️
production 63.26% <100.00%> (-0.67%) ⬇️
test 88.49% <ø> (-0.41%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
src/Microsoft.ML.CpuMath/AvxIntrinsics.cs 58.18% <ø> (-38.51%) ⬇️
src/Microsoft.ML.CpuMath/CpuMathUtils.cs 100.00% <100.00%> (ø)
...rc/Microsoft.ML.CpuMath/CpuMathUtils.netcoreapp.cs 97.80% <100.00%> (-0.84%) ⬇️
src/Microsoft.ML.CpuMath/SseIntrinsics.cs 54.80% <ø> (-41.55%) ⬇️

... and 48 files with indirect coverage changes

@michaelgsharp
Copy link
Member Author

/azp run

Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@michaelgsharp michaelgsharp merged commit d2cf997 into dotnet:main Nov 15, 2023
23 checks passed
@michaelgsharp michaelgsharp deleted the tensor-math2 branch November 15, 2023 05:46
@github-actions github-actions bot locked and limited conversation to collaborators Dec 15, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants