Compiled code benchmarks: easy benchmarking of `preview` against current #16942

brianrourkeboll · 2024-03-24T19:10:45Z

Description

Add the compiled code benchmark projects to VisualFSharp.sln.
Make it easy to bench langversion preview against current.
- The compiled code benchmarks now default to using the selected benchmarks built with the current langversion as the baseline; the same benchmarks built with langversion preview are then automatically benchmarked against those for easy comparison.
  
  Ideally, it should be possible to run benchmark comparisons for compiled code built using any combination of:
  - The compiler shipped in the most recent stable SDK.
  - A compiler shipped in a specific previous SDK release.
  - A specific language version, including preview.
  - The locally-built compiler.
  Of those, the combinations most likely to be useful are probably:
  - The local compiler with the current langversion as baseline against the local compiler with langversion preview. This is what this PR adds.
  - The most recent shipped compiler as baseline against the local compiler, with or without langversion preview. This is definitely achievable with BenchmarkDotNet, but I tried, and it won't work without updates to the current cascade of build configurations from all the Directory.Build.props, etc., at various levels, which I didn't want to undertake.
Add (one set of) compiled code benchmarks for the integral range optimizations added in Better integral range lowering: start..finish, start..step..finish #16650 and Optimize simple range mappings: [for n in start..finish -> f n], &c. #16832.

Benchmark switcher

Example output

| Job     | Categories                                                                             | start | finish | step | Mean           | Error         | StdDev        | Median         | Ratio | RatioSD | Gen0   | Gen1   | Gen2   | Allocated | Alloc Ratio |
|-------- |--------------------------------------------------------------------------------------- |------ |------- |----- |---------------:|--------------:|--------------:|---------------:|------:|--------:|-------:|-------:|-------:|----------:|------------:|
| Current | UInt32,[|127u..1u|],ComputedCollections,Arrays,IntegralRanges                          | ?     | ?      | ?    |      24.046 ns |     0.4269 ns |     0.3993 ns |      23.985 ns |  1.00 |    0.00 | 0.0004 |      - |      - |      96 B |        1.00 |
| Preview | UInt32,[|127u..1u|],ComputedCollections,Arrays,IntegralRanges                          | ?     | ?      | ?    |       1.729 ns |     0.0804 ns |     0.0752 ns |       1.725 ns |  0.07 |    0.00 |      - |      - |      - |         - |        0.00 |
|         |                                                                                        |       |        |      |                |               |               |                |       |         |        |        |        |           |             |
| Current | UInt32,[|127u..2u..1u|],ComputedCollections,Arrays,IntegralRanges                      | ?     | ?      | ?    |      22.817 ns |     0.2053 ns |     0.1920 ns |      22.760 ns |  1.00 |    0.00 | 0.0004 |      - |      - |      96 B |        1.00 |
| Preview | UInt32,[|127u..2u..1u|],ComputedCollections,Arrays,IntegralRanges                      | ?     | ?      | ?    |       3.161 ns |     0.1053 ns |     0.0985 ns |       3.172 ns |  0.14 |    0.00 |      - |      - |      - |         - |        0.00 |
|         |                                                                                        |       |        |      |                |               |               |                |       |         |        |        |        |           |             |
| Current | UInt32,[|1u..127u|],ComputedCollections,Arrays,IntegralRanges                          | ?     | ?      | ?    |     361.493 ns |     4.3161 ns |     3.8261 ns |     361.798 ns |  1.00 |    0.00 | 0.0072 |      - |      - |    1768 B |        1.00 |
| Preview | UInt32,[|1u..127u|],ComputedCollections,Arrays,IntegralRanges                          | ?     | ?      | ?    |      96.560 ns |     1.9609 ns |     3.6347 ns |      94.721 ns |  0.27 |    0.01 | 0.0021 |      - |      - |     536 B |        0.30 |
|         |                                                                                        |       |        |      |                |               |               |                |       |         |        |        |        |           |             |
| Current | UInt32,[|1u..2u..127u|],ComputedCollections,Arrays,IntegralRanges                      | ?     | ?      | ?    |     263.240 ns |     3.4600 ns |     2.8893 ns |     264.086 ns |  1.00 |    0.00 | 0.0029 |      - |      - |     712 B |        1.00 |
| Preview | UInt32,[|1u..2u..127u|],ComputedCollections,Arrays,IntegralRanges                      | ?     | ?      | ?    |      58.053 ns |     1.1757 ns |     1.6481 ns |      57.840 ns |  0.22 |    0.01 | 0.0011 |      - |      - |     280 B |        0.39 |
|         |                                                                                        |       |        |      |                |               |               |                |       |         |        |        |        |           |             |
| Current | UInt32,[|1u..2u..32767u|],ComputedCollections,Arrays,IntegralRanges                    | ?     | ?      | ?    |  40,529.790 ns |   272.6267 ns |   241.6764 ns |  40,486.288 ns |  1.00 |    0.00 | 0.4883 |      - |      - |  131464 B |        1.00 |
| Preview | UInt32,[|1u..2u..32767u|],ComputedCollections,Arrays,IntegralRanges                    | ?     | ?      | ?    |   7,787.907 ns |   152.9334 ns |   176.1183 ns |   7,737.320 ns |  0.19 |    0.00 | 0.2747 |      - |      - |   65560 B |        0.50 |
|         |                                                                                        |       |        |      |                |               |               |                |       |         |        |        |        |           |             |
| Current | UInt32,[|1u..32767u|],ComputedCollections,Arrays,IntegralRanges                        | ?     | ?      | ?    | 256,084.235 ns | 5,074.6636 ns | 6,598.4961 ns | 257,729.980 ns |  1.00 |    0.00 | 8.3008 | 8.3008 | 8.3008 |  393680 B |        1.00 |
| Preview | UInt32,[|1u..32767u|],ComputedCollections,Arrays,IntegralRanges                        | ?     | ?      | ?    |  77,660.979 ns | 1,541.8822 ns | 4,399.0768 ns |  77,866.278 ns |  0.31 |    0.02 | 2.8076 | 2.8076 | 2.8076 |  131088 B |        0.33 |
|         |                                                                                        |       |        |      |                |               |               |                |       |         |        |        |        |           |             |
| Current | UInt32,[|for n in start..finish -> n|],ComputedCollections,Arrays,IntegralRanges       | 0     | 32767  | ?    | 281,373.636 ns | 5,097.5675 ns | 4,518.8608 ns | 282,881.763 ns |  1.00 |    0.00 | 8.7891 | 8.7891 | 8.7891 |  393741 B |        1.00 |
| Preview | UInt32,[|for n in start..finish -> n|],ComputedCollections,Arrays,IntegralRanges       | 0     | 32767  | ?    |  77,629.964 ns | 1,545.8980 ns | 4,509.4572 ns |  77,968.518 ns |  0.29 |    0.02 | 3.0518 | 3.0518 | 3.0518 |  131090 B |        0.33 |
|         |                                                                                        |       |        |      |                |               |               |                |       |         |        |        |        |           |             |
| Current | UInt32,[|for n in start..step..finish -> n|],ComputedCollections,Arrays,IntegralRanges | 0     | 32767  | 2    |  69,948.064 ns | 1,078.6284 ns | 1,154.1203 ns |  69,834.222 ns |  1.00 |    0.00 | 0.7324 |      - |      - |  197056 B |        1.00 |
| Preview | UInt32,[|for n in start..step..finish -> n|],ComputedCollections,Arrays,IntegralRanges | 0     | 32767  | 2    |   7,700.286 ns |   115.4058 ns |   107.9507 ns |   7,679.921 ns |  0.11 |    0.00 | 0.2747 |      - |      - |   65560 B |        0.33 |
|         |                                                                                        |       |        |      |                |               |               |                |       |         |        |        |        |           |             |
| Current | UInt32,[|start..finish|],ComputedCollections,Arrays,IntegralRanges                     | 0     | 32767  | ?    | 148,726.931 ns | 2,956.8132 ns | 4,603.4019 ns | 148,672.632 ns |  1.00 |    0.00 | 4.8828 | 4.3945 | 4.3945 |  262584 B |        1.00 |
| Preview | UInt32,[|start..finish|],ComputedCollections,Arrays,IntegralRanges                     | 0     | 32767  | ?    |  77,915.564 ns | 1,554.2518 ns | 3,476.3069 ns |  77,861.060 ns |  0.52 |    0.03 | 4.0283 | 4.0283 | 4.0283 |  131095 B |        0.50 |
|         |                                                                                        |       |        |      |                |               |               |                |       |         |        |        |        |           |             |
| Current | UInt32,[|start..step..finish|],ComputedCollections,Arrays,IntegralRanges               | 0     | 32767  | 2    |  38,456.304 ns |   682.2118 ns |   638.1413 ns |  38,380.719 ns |  1.00 |    0.00 | 0.4883 |      - |      - |  131464 B |        1.00 |
| Preview | UInt32,[|start..step..finish|],ComputedCollections,Arrays,IntegralRanges               | 0     | 32767  | 2    |   7,791.339 ns |    93.7728 ns |    87.7152 ns |   7,789.114 ns |  0.20 |    0.00 | 0.2747 |      - |      - |   65560 B |        0.50 |

Checklist

Release notes entry updated: this shouldn't need release notes.

* The compiled code benchmarks now default to using the selected benchmarks built with the current langversion as the baseline; the same benchmarks built with langversion preview are then automatically benchmarked against those for easy comparison.

github-actions · 2024-03-24T19:11:23Z

✅ No release notes required

T-Gro

The infrastructure to compare new/old compiler is the right thing to have here, this is really making the benchmarks folder useful by showing how to benchmark compiler perf improvements.

I do not think we shall include all of the individual comparison instances in the repo forever - they are a useful measurement for your feature PR to demonstrate it brings improvements. But for keeping them in the repo, I think it would be better to just keep a few instances demonstrating the setup.

Also to make it clear that the purpose of the folder is to demonstrate the different setup mechanisms like local vs shipped compiler and the same for FSharp.Core, not to persist every benchmark executed.

psfinaki

Great stuff Brian. Looking forward to try this in action, I will make a lot of use of it :)

As for the new benchmarks - I am a bit on the fence here. T-Gro's arguments make sense to me, in particular I think there are 2 reasons to have some specific benchmarks here:

It is a good demonstration of a benchmarking technique not yet used here
We plan to use them in the near future, as in, we continue working on some feature perf

I have recently started including F# micro perf to the dotnet/performance repo. We'll see how that plays out - in theory, this is the right place for keeping track of such things, and this is what C# does also. But it will take us some time to tune things out there.

Maybe you can keep one of the new benchmarks. Or all of them if you insist and nobody minds. Either way, it would be awesome if you add the README here, which can be a concatenation of the PR description and these comments, you can also take inspiration from other READMEs in the benchmarks folder.

Thank you!

tests/benchmarks/CompiledCodeBenchmarks/MicroPerf/Benchmarks.fs

brianrourkeboll · 2024-03-25T17:33:46Z

Hmm.

(Forgive me if I am gravely misunderstanding something; I don't know enough about the intent or ownership of dotnet/performance to know what role, if any, it should play here.)

Out of curiosity, I browsed through some Roslyn PRs that introduced codegen changes to see if they had any useful practices to borrow... But they mostly seemed to just post benchmark source and results in comments, and the benchmarks themselves seemed to be independent benchmarks of what the new codegen was ostensibly doing. At a glance, I didn't see any benchmarks that actually used an older compiler's output as a baseline and benchmarked the new output against that.

@psfinaki

I have recently started including F# micro perf to the dotnet/performance repo. We'll see how that plays out - in theory, this is the right place for keeping track of such things, and this is what C# does also.

Yeah, although, on the surface, the dotnet/performance repo currently seems geared much more towards testing (1) the performance of the runtime itself and (2) the performance of specific first-party libraries, than, say, Roslyn codegen.

There are mentions of benching two different runtime versions against each other —

— which could perhaps be adapted to work with different FCS (or language, etc.) versions.

If it makes sense to start using dotnet/performance as a way to test FCS codegen changes (and perhaps library changes for FSharp.Core, similar to how many System.* and Microsoft.* libraries are tested there), then maybe we can just come up with and document two new workflows there, each for both codegen and library (FSharp.Core) changes:

One for long-lived benchmarks deemed worth merging into dotnet/performance, e.g., as a way to detect regressions.
Another for disposable benchmarks run during local development that are not checked in.

Or we could keep them separate — use dotnet/performance for long-lived benchmarks, and have some kind of documentation and maybe template(s) here, in this repo, for writing and running short-lived, disposable benchmarks during local development.

@T-Gro

I do not think we shall include all of the individual comparison instances in the repo forever - they are a useful measurement for your feature PR to demonstrate it brings improvements.

On the one hand, I can understand wanting to avoid accumulating a mess of outdated benchmarks in source: either they will never be run again, in which case they are (worse than) useless, or else they must be run at some frequency, which has development time, deployment time, and infrastructure costs.

On the other hand, it seems to me that if a performance improvement is worth making and benchmarking in the first place, it might be worth keeping the benchmarks around to detect unintended regressions, at least for sufficiently significant or fundamental things. That is again only useful if the benchmarks are actually run at some interval after changes are made, but maybe that's what dotnet/performance is for?

psfinaki · 2024-03-25T20:05:31Z

@brianrourkeboll, so first of all - no gravely misunderstanding here, we are trying to understand our options ourselves :)

So this

If it makes sense to start using dotnet/performance as a way to test FCS codegen changes (and perhaps library changes for FSharp.Core, similar to how many System.* and Microsoft.* libraries are tested there), then maybe we can just come up with and document two new workflows there, each for both codegen and library (FSharp.Core) changes:

One for long-lived benchmarks deemed worth merging into dotnet/performance, e.g., as a way to detect regressions.
Another for disposable benchmarks run during local development that are not checked in.
Or we could keep them separate — use dotnet/performance for long-lived benchmarks, and have some kind of documentation and maybe template(s) here, in this repo, for writing and running short-lived, disposable benchmarks during local development.

... is roughly my current vision (and maybe the team's vision but not sure :D). Because this

On the other hand, it seems to me that if a performance improvement is worth making and benchmarking in the first place, it might be worth keeping the benchmarks around to detect unintended regressions, at least for sufficiently significant or fundamental things. That is again only useful if the benchmarks are actually run at some interval after changes are made, but maybe that's what dotnet/performance is for?

is supposed to be the case. dotnet\performance is indeed for tracking regressions. They run all the benchmarks all the time, automatically create baselines and automatically create issues.

However, this is just a vision, I've just started adding some F# code there and currently watching the situation. Hence I don't mind having more benchmarks in our repo until we are convinced about dotnet/performance.

…-code-benchmarks

brianrourkeboll · 2024-03-25T22:09:58Z

All right. I went ahead and added a simple readme and left only one set of new benchmarks. Let me know if you'd rather I pared that down even further.

psfinaki

Awesome. LGTM. Thanks!

T-Gro

I still do not like the amount of code being checked in, especially considering it is mostly duplicate.

But I will not block it - let's check it in.
I think a benchmark could be made generic (for the various integral types here), further reducing it down. Will just need to figure out how to turn the generic argument into a category-like column.

brianrourkeboll · 2024-03-26T20:40:33Z

@T-Gro The duplication is necessary because, if the range type for [|start..finish|], etc., is not known at compile-time, then the integral range optimizations will not be applied (since the compiler does not know whether start and finish will be integral types at all), and the generated code will use the IEnumerator<_>-based RangeStepGeneric

I know because I tried to use a single generic benchmark class and use [<GenericTypeArguments(…)>], and, while it "works" (compiles and runs), it doesn't actually benchmark what it looks like it's benchmarking :). I suspect that BenchmarkDotNet is just generating subclasses that instantiate the type argument to the specified type, and then invoke the members implemented in the base class — which here will again be based on RangeStepGeneric, since the base class itself is generic and does not know whether the type will be an integral type.

The benchmarks for testing the compile-time computation of constant counts also must be inlined and thus duplicated per type, i.e., at least the generic ``127..-1..-128``<_>, etc., must be invoked once per type, as ``127..-1..-128``<int> (), ``127..-1..-128``<uint> (), etc.

If you don't think it's useful to have these benchmarks checked in here at all, I'm fine with removing them altogether. Maybe they would be a better fit for dotnet/performance at some point?

* PrettyNaming: make DoesIdentifierNeedBackticks public (#16613) * Update dependencies from https://github.com/dotnet/arcade build (#16944) Microsoft.DotNet.Arcade.Sdk From Version 8.0.0-beta.24170.6 -> To Version 8.0.0-beta.24172.5 Co-authored-by: dotnet-maestro[bot] <dotnet-maestro[bot]@users.noreply.github.com> * Unskip a few tests (#16946) * Unskip a few tests * another one * Fix [<tailcall>] false positive with yield! (#16933) * add test case showing a false positive with yield! * add missing case in IsAppInLambdaBody that can happen with yield! * add release notes entry * update PR number * only bind to what we are interested in * add test expecting a warning for yield! in a list comprehension * add test case using yield! in a custom CE that overflows the stack * Add regression test for nameof type with generic parameters (#16827) Co-authored-by: Vlad Zarytovskii <vzaritovsky@hotmail.com> * Minor README update about proto (#16945) * Minor README update about proto It's Bootstrap actually. * Update DEVGUIDE.md Co-authored-by: Vlad Zarytovskii <vzaritovsky@hotmail.com> * Update DEVGUIDE.md --------- Co-authored-by: Vlad Zarytovskii <vzaritovsky@hotmail.com> * Compiled code benchmarks: easy benchmarking of `preview` against current (#16942) * Make it easy to bench preview against current * Revert EnableDefaultNoneItems false (#16953) * Exclude compiler service benchmark from VMR build when not building tests (#16955) * Exclude compiler service benchmark from VMR build when not building tests * remove Z --------- Co-authored-by: Petr <psfinaki@users.noreply.github.com> * Update dependencies from https://github.com/dotnet/arcade build 20240326.8 (#16957) Microsoft.DotNet.Arcade.Sdk From Version 8.0.0-beta.24172.5 -> To Version 8.0.0-beta.24176.8 Co-authored-by: dotnet-maestro[bot] <dotnet-maestro[bot]@users.noreply.github.com> --------- Co-authored-by: Eugene Auduchinok <eugene.auduchinok@jetbrains.com> Co-authored-by: dotnet-maestro[bot] <42748379+dotnet-maestro[bot]@users.noreply.github.com> Co-authored-by: dotnet-maestro[bot] <dotnet-maestro[bot]@users.noreply.github.com> Co-authored-by: Petr <psfinaki@users.noreply.github.com> Co-authored-by: dawe <dawedawe@posteo.de> Co-authored-by: Florian Verdonck <florian.verdonck@outlook.com> Co-authored-by: Vlad Zarytovskii <vzaritovsky@hotmail.com> Co-authored-by: Brian Rourke Boll <brianrourkeboll@users.noreply.github.com> Co-authored-by: Kevin Ransom (msft) <codecutter@hotmail.com> Co-authored-by: Matt Mitchell <mmitche@microsoft.com>

brianrourkeboll added 3 commits March 24, 2024 14:29

Add compiled code bench projs to sln

6d26484

Add integral range benchmarks

5b69ab3

brianrourkeboll marked this pull request as ready for review March 24, 2024 20:01

brianrourkeboll requested a review from a team as a code owner March 24, 2024 20:01

T-Gro reviewed Mar 25, 2024

View reviewed changes

psfinaki reviewed Mar 25, 2024

View reviewed changes

tests/benchmarks/CompiledCodeBenchmarks/MicroPerf/Benchmarks.fs Outdated Show resolved Hide resolved

tests/benchmarks/CompiledCodeBenchmarks/MicroPerf/Benchmarks.fs Outdated Show resolved Hide resolved

brianrourkeboll added 4 commits March 25, 2024 16:41

Benchmarks → Program

02062e9

Leave only one new benchmark set

2805496

Add simple readme

619766f

Merge branch 'main' of https://github.com/dotnet/fsharp into compiled…

fe2bcc1

…-code-benchmarks

psfinaki approved these changes Mar 26, 2024

View reviewed changes

T-Gro approved these changes Mar 26, 2024

View reviewed changes

T-Gro merged commit 58b4d08 into dotnet:main Mar 26, 2024

brianrourkeboll deleted the compiled-code-benchmarks branch March 26, 2024 20:46

This was referenced Apr 5, 2024

[Automated] PRs inserted in VS build main-34805.21 #16992

Closed

[Automated] PRs inserted in VS build feature.dotnetVS-34808.241 #17011

Closed

dotnet-bot mentioned this pull request Apr 16, 2024

[Automated] PRs inserted in VS build feature.debugger.main-34815.163 #17050

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Compiled code benchmarks: easy benchmarking of `preview` against current #16942

Compiled code benchmarks: easy benchmarking of `preview` against current #16942

Uh oh!

brianrourkeboll commented Mar 24, 2024 •

edited

Loading

Uh oh!

github-actions bot commented Mar 24, 2024

Uh oh!

T-Gro left a comment

Uh oh!

psfinaki left a comment

Uh oh!

Uh oh!

Uh oh!

brianrourkeboll commented Mar 25, 2024

Uh oh!

psfinaki commented Mar 25, 2024

Uh oh!

brianrourkeboll commented Mar 25, 2024 •

edited

Loading

Uh oh!

psfinaki left a comment

Uh oh!

T-Gro left a comment •

edited

Loading

Uh oh!

brianrourkeboll commented Mar 26, 2024 •

edited

Loading

Uh oh!

Uh oh!

Compiled code benchmarks: easy benchmarking of preview against current #16942

Compiled code benchmarks: easy benchmarking of preview against current #16942

Uh oh!

Conversation

brianrourkeboll commented Mar 24, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Benchmark switcher

Example output

Checklist

Uh oh!

github-actions bot commented Mar 24, 2024

✅ No release notes required

Uh oh!

T-Gro left a comment

Choose a reason for hiding this comment

Uh oh!

psfinaki left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

brianrourkeboll commented Mar 25, 2024

Uh oh!

psfinaki commented Mar 25, 2024

Uh oh!

brianrourkeboll commented Mar 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

psfinaki left a comment

Choose a reason for hiding this comment

Uh oh!

T-Gro left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

brianrourkeboll commented Mar 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Compiled code benchmarks: easy benchmarking of `preview` against current #16942

Compiled code benchmarks: easy benchmarking of `preview` against current #16942

brianrourkeboll commented Mar 24, 2024 •

edited

Loading

brianrourkeboll commented Mar 25, 2024 •

edited

Loading

T-Gro left a comment •

edited

Loading

brianrourkeboll commented Mar 26, 2024 •

edited

Loading