Skip to content

Add strength reduction benchmarks #4317

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jul 18, 2024

Conversation

jakobbotsch
Copy link
Member

@jakobbotsch jakobbotsch commented Jul 17, 2024

This adds strength reduction benchmarks for arrays of a few different element sizes, motivated by the differences in codegen. The element sizes give different characteristics of how we access each element. For x64, the current instruction codegen looks like:

2: load
3: lea + load
4: load
8: load
12: lea + load
16: shl + load
29: imul + load

Each size has 3 variants of benchmarks: an array version, a span version, and a fully strength reduced manual version. The JIT is expected to be able to transform the array version into the strength reduced version soon. The span version will also be transformed, but not quite all the way (the strength reduction will not be able to fold in the base byref of the span).

There is one current annoyance to work around in the JIT: we do not align the strength-reduced versions of the loops because they end up being "too small", meaning that they still fit within a single cache line. However, it turns out alignment is still beneficial in these cases, and this skews the results compared to the non-strength reduced versions. I have opened dotnet/runtime#104665 about this. To work around the problem in these benchmarks I have added a superfluous bitwise or operation in the body of all the loops.

On my Intel CPU the current results are:

Method Mean Error StdDev Median Min Max Ratio RatioSD Code Size Allocated Alloc Ratio
SumS12Array 4.813 us 0.2538 us 0.2923 us 4.722 us 4.445 us 5.394 us 1.00 0.08 73 B - NA
SumS12Span 4.530 us 0.1372 us 0.1580 us 4.467 us 4.334 us 4.844 us 0.94 0.06 126 B - NA
SumS12ArrayStrengthReduced 3.712 us 0.0918 us 0.1058 us 3.653 us 3.608 us 3.913 us 0.77 0.05 65 B - NA
SumS16Array 4.557 us 0.0569 us 0.0532 us 4.544 us 4.482 us 4.677 us 1.00 0.02 73 B - NA
SumS16Span 4.529 us 0.0277 us 0.0259 us 4.532 us 4.479 us 4.572 us 0.99 0.01 126 B - NA
SumS16ArrayStrengthReduced 3.836 us 0.0349 us 0.0326 us 3.828 us 3.796 us 3.920 us 0.84 0.01 65 B - NA
SumS29Array 5.260 us 0.0623 us 0.0583 us 5.243 us 5.181 us 5.378 us 1.00 0.02 84 B - NA
SumS29Span 5.265 us 0.0474 us 0.0444 us 5.258 us 5.200 us 5.349 us 1.00 0.01 132 B - NA
SumS29ArrayStrengthReduced 4.340 us 0.0349 us 0.0326 us 4.339 us 4.282 us 4.384 us 0.83 0.01 76 B - NA
SumS3Array 4.324 us 0.0318 us 0.0298 us 4.321 us 4.287 us 4.392 us 1.00 0.01 74 B - NA
SumS3Span 4.352 us 0.0243 us 0.0228 us 4.360 us 4.301 us 4.382 us 1.01 0.01 127 B - NA
SumS3ArrayStrengthReduced 3.289 us 0.0195 us 0.0183 us 3.287 us 3.266 us 3.339 us 0.76 0.01 66 B - NA
SumS8Array 3.794 us 0.1145 us 0.1319 us 3.744 us 3.691 us 4.177 us 1.00 0.05 69 B - NA
SumS8Span 3.743 us 0.0213 us 0.0199 us 3.738 us 3.720 us 3.805 us 0.99 0.03 122 B - NA
SumS8ArrayStrengthReduced 3.435 us 0.0647 us 0.0719 us 3.425 us 3.346 us 3.713 us 0.91 0.03 65 B - NA
SumIntsArray 3.719 us 0.1488 us 0.1713 us 3.631 us 3.568 us 4.032 us 1.00 0.06 70 B - NA
SumIntsSpan 3.621 us 0.0188 us 0.0176 us 3.621 us 3.589 us 3.646 us 0.98 0.04 121 B - NA
SumIntsArrayStrengthReduced 3.447 us 0.2373 us 0.2733 us 3.338 us 3.286 us 4.516 us 0.93 0.08 65 B - NA
SumLongsArray 3.785 us 0.1116 us 0.1285 us 3.741 us 3.669 us 4.180 us 1.00 0.05 69 B - NA
SumLongsSpan 3.792 us 0.1443 us 0.1662 us 3.695 us 3.617 us 4.123 us 1.00 0.05 123 B - NA
SumLongsArrayStrengthReduced 3.454 us 0.0634 us 0.0593 us 3.443 us 3.402 us 3.656 us 0.91 0.03 65 B - NA
SumShortsArray 3.626 us 0.0810 us 0.0933 us 3.619 us 3.507 us 3.869 us 1.00 0.04 70 B - NA
SumShortsSpan 3.586 us 0.0416 us 0.0389 us 3.581 us 3.526 us 3.693 us 0.99 0.03 122 B - NA
SumShortsArrayStrengthReduced 3.317 us 0.0541 us 0.0506 us 3.302 us 3.263 us 3.442 us 0.92 0.03 66 B - NA

This adds strength reduction benchmarks for arrays of a few different
element sizes, motivated by the differences in codegen.
2: load
3: lea + load
4: load
8: load
12: lea + load
16: shl + load
29: imul

Each size has 3 variants of benchmarks: an array version, a span
version, and a fully strength reduced manual version. The JIT is
expected to be able to transform the array version into the strength
reduced version soon. The span version will also be transformed, but not
quite all the way (the strength reduction will not be able to fold in
the base byref of the span).
@jakobbotsch
Copy link
Member Author

cc @LoopedBard3 @DrewScoggins

@jakobbotsch
Copy link
Member Author

jakobbotsch commented Jul 18, 2024

I don't think the CI failures are related.

@LoopedBard3
Copy link
Member

Yes, the current ci failures do not seem to be related, is this ready for review and merge?

@jakobbotsch
Copy link
Member Author

Yes, the current ci failures do not seem to be related, is this ready for review and merge?

Yeah, this is ready.

Copy link
Member

@LoopedBard3 LoopedBard3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, thank you for the benchmarks!

@LoopedBard3 LoopedBard3 merged commit 49f03f7 into dotnet:main Jul 18, 2024
45 of 63 checks passed
@jakobbotsch jakobbotsch deleted the strength-reduction-benchmarks branch July 18, 2024 17:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants