Skip to content

Compiled code benchmarks: easy benchmarking of preview against current #16942

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Mar 26, 2024

Conversation

brianrourkeboll
Copy link
Contributor

@brianrourkeboll brianrourkeboll commented Mar 24, 2024

Description

  • Add the compiled code benchmark projects to VisualFSharp.sln.
  • Make it easy to bench langversion preview against current.
    • The compiled code benchmarks now default to using the selected benchmarks built with the current langversion as the baseline; the same benchmarks built with langversion preview are then automatically benchmarked against those for easy comparison.

      Ideally, it should be possible to run benchmark comparisons for compiled code built using any combination of:

      • The compiler shipped in the most recent stable SDK.
      • A compiler shipped in a specific previous SDK release.
      • A specific language version, including preview.
      • The locally-built compiler.

      Of those, the combinations most likely to be useful are probably:

      • The local compiler with the current langversion as baseline against the local compiler with langversion preview. This is what this PR adds.
      • The most recent shipped compiler as baseline against the local compiler, with or without langversion preview. This is definitely achievable with BenchmarkDotNet, but I tried, and it won't work without updates to the current cascade of build configurations from all the Directory.Build.props, etc., at various levels, which I didn't want to undertake.
  • Add (one set of) compiled code benchmarks for the integral range optimizations added in Better integral range lowering: start..finish, start..step..finish #16650 and Optimize simple range mappings: [for n in start..finish -> f n], &c. #16832.

Benchmark switcher

image

Example output

image

| Job     | Categories                                                                             | start | finish | step | Mean           | Error         | StdDev        | Median         | Ratio | RatioSD | Gen0   | Gen1   | Gen2   | Allocated | Alloc Ratio |
|-------- |--------------------------------------------------------------------------------------- |------ |------- |----- |---------------:|--------------:|--------------:|---------------:|------:|--------:|-------:|-------:|-------:|----------:|------------:|
| Current | UInt32,[|127u..1u|],ComputedCollections,Arrays,IntegralRanges                          | ?     | ?      | ?    |      24.046 ns |     0.4269 ns |     0.3993 ns |      23.985 ns |  1.00 |    0.00 | 0.0004 |      - |      - |      96 B |        1.00 |
| Preview | UInt32,[|127u..1u|],ComputedCollections,Arrays,IntegralRanges                          | ?     | ?      | ?    |       1.729 ns |     0.0804 ns |     0.0752 ns |       1.725 ns |  0.07 |    0.00 |      - |      - |      - |         - |        0.00 |
|         |                                                                                        |       |        |      |                |               |               |                |       |         |        |        |        |           |             |
| Current | UInt32,[|127u..2u..1u|],ComputedCollections,Arrays,IntegralRanges                      | ?     | ?      | ?    |      22.817 ns |     0.2053 ns |     0.1920 ns |      22.760 ns |  1.00 |    0.00 | 0.0004 |      - |      - |      96 B |        1.00 |
| Preview | UInt32,[|127u..2u..1u|],ComputedCollections,Arrays,IntegralRanges                      | ?     | ?      | ?    |       3.161 ns |     0.1053 ns |     0.0985 ns |       3.172 ns |  0.14 |    0.00 |      - |      - |      - |         - |        0.00 |
|         |                                                                                        |       |        |      |                |               |               |                |       |         |        |        |        |           |             |
| Current | UInt32,[|1u..127u|],ComputedCollections,Arrays,IntegralRanges                          | ?     | ?      | ?    |     361.493 ns |     4.3161 ns |     3.8261 ns |     361.798 ns |  1.00 |    0.00 | 0.0072 |      - |      - |    1768 B |        1.00 |
| Preview | UInt32,[|1u..127u|],ComputedCollections,Arrays,IntegralRanges                          | ?     | ?      | ?    |      96.560 ns |     1.9609 ns |     3.6347 ns |      94.721 ns |  0.27 |    0.01 | 0.0021 |      - |      - |     536 B |        0.30 |
|         |                                                                                        |       |        |      |                |               |               |                |       |         |        |        |        |           |             |
| Current | UInt32,[|1u..2u..127u|],ComputedCollections,Arrays,IntegralRanges                      | ?     | ?      | ?    |     263.240 ns |     3.4600 ns |     2.8893 ns |     264.086 ns |  1.00 |    0.00 | 0.0029 |      - |      - |     712 B |        1.00 |
| Preview | UInt32,[|1u..2u..127u|],ComputedCollections,Arrays,IntegralRanges                      | ?     | ?      | ?    |      58.053 ns |     1.1757 ns |     1.6481 ns |      57.840 ns |  0.22 |    0.01 | 0.0011 |      - |      - |     280 B |        0.39 |
|         |                                                                                        |       |        |      |                |               |               |                |       |         |        |        |        |           |             |
| Current | UInt32,[|1u..2u..32767u|],ComputedCollections,Arrays,IntegralRanges                    | ?     | ?      | ?    |  40,529.790 ns |   272.6267 ns |   241.6764 ns |  40,486.288 ns |  1.00 |    0.00 | 0.4883 |      - |      - |  131464 B |        1.00 |
| Preview | UInt32,[|1u..2u..32767u|],ComputedCollections,Arrays,IntegralRanges                    | ?     | ?      | ?    |   7,787.907 ns |   152.9334 ns |   176.1183 ns |   7,737.320 ns |  0.19 |    0.00 | 0.2747 |      - |      - |   65560 B |        0.50 |
|         |                                                                                        |       |        |      |                |               |               |                |       |         |        |        |        |           |             |
| Current | UInt32,[|1u..32767u|],ComputedCollections,Arrays,IntegralRanges                        | ?     | ?      | ?    | 256,084.235 ns | 5,074.6636 ns | 6,598.4961 ns | 257,729.980 ns |  1.00 |    0.00 | 8.3008 | 8.3008 | 8.3008 |  393680 B |        1.00 |
| Preview | UInt32,[|1u..32767u|],ComputedCollections,Arrays,IntegralRanges                        | ?     | ?      | ?    |  77,660.979 ns | 1,541.8822 ns | 4,399.0768 ns |  77,866.278 ns |  0.31 |    0.02 | 2.8076 | 2.8076 | 2.8076 |  131088 B |        0.33 |
|         |                                                                                        |       |        |      |                |               |               |                |       |         |        |        |        |           |             |
| Current | UInt32,[|for n in start..finish -> n|],ComputedCollections,Arrays,IntegralRanges       | 0     | 32767  | ?    | 281,373.636 ns | 5,097.5675 ns | 4,518.8608 ns | 282,881.763 ns |  1.00 |    0.00 | 8.7891 | 8.7891 | 8.7891 |  393741 B |        1.00 |
| Preview | UInt32,[|for n in start..finish -> n|],ComputedCollections,Arrays,IntegralRanges       | 0     | 32767  | ?    |  77,629.964 ns | 1,545.8980 ns | 4,509.4572 ns |  77,968.518 ns |  0.29 |    0.02 | 3.0518 | 3.0518 | 3.0518 |  131090 B |        0.33 |
|         |                                                                                        |       |        |      |                |               |               |                |       |         |        |        |        |           |             |
| Current | UInt32,[|for n in start..step..finish -> n|],ComputedCollections,Arrays,IntegralRanges | 0     | 32767  | 2    |  69,948.064 ns | 1,078.6284 ns | 1,154.1203 ns |  69,834.222 ns |  1.00 |    0.00 | 0.7324 |      - |      - |  197056 B |        1.00 |
| Preview | UInt32,[|for n in start..step..finish -> n|],ComputedCollections,Arrays,IntegralRanges | 0     | 32767  | 2    |   7,700.286 ns |   115.4058 ns |   107.9507 ns |   7,679.921 ns |  0.11 |    0.00 | 0.2747 |      - |      - |   65560 B |        0.33 |
|         |                                                                                        |       |        |      |                |               |               |                |       |         |        |        |        |           |             |
| Current | UInt32,[|start..finish|],ComputedCollections,Arrays,IntegralRanges                     | 0     | 32767  | ?    | 148,726.931 ns | 2,956.8132 ns | 4,603.4019 ns | 148,672.632 ns |  1.00 |    0.00 | 4.8828 | 4.3945 | 4.3945 |  262584 B |        1.00 |
| Preview | UInt32,[|start..finish|],ComputedCollections,Arrays,IntegralRanges                     | 0     | 32767  | ?    |  77,915.564 ns | 1,554.2518 ns | 3,476.3069 ns |  77,861.060 ns |  0.52 |    0.03 | 4.0283 | 4.0283 | 4.0283 |  131095 B |        0.50 |
|         |                                                                                        |       |        |      |                |               |               |                |       |         |        |        |        |           |             |
| Current | UInt32,[|start..step..finish|],ComputedCollections,Arrays,IntegralRanges               | 0     | 32767  | 2    |  38,456.304 ns |   682.2118 ns |   638.1413 ns |  38,380.719 ns |  1.00 |    0.00 | 0.4883 |      - |      - |  131464 B |        1.00 |
| Preview | UInt32,[|start..step..finish|],ComputedCollections,Arrays,IntegralRanges               | 0     | 32767  | 2    |   7,791.339 ns |    93.7728 ns |    87.7152 ns |   7,789.114 ns |  0.20 |    0.00 | 0.2747 |      - |      - |   65560 B |        0.50 |

Checklist

  • Release notes entry updated: this shouldn't need release notes.

* The compiled code benchmarks now default to using the selected
  benchmarks built with the current langversion as the baseline; the
  same benchmarks built with langversion preview are then automatically
  benchmarked against those for easy comparison.
Copy link
Contributor

✅ No release notes required

@brianrourkeboll brianrourkeboll marked this pull request as ready for review March 24, 2024 20:01
@brianrourkeboll brianrourkeboll requested a review from a team as a code owner March 24, 2024 20:01
Copy link
Member

@T-Gro T-Gro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The infrastructure to compare new/old compiler is the right thing to have here, this is really making the benchmarks folder useful by showing how to benchmark compiler perf improvements.

I do not think we shall include all of the individual comparison instances in the repo forever - they are a useful measurement for your feature PR to demonstrate it brings improvements. But for keeping them in the repo, I think it would be better to just keep a few instances demonstrating the setup.

Also to make it clear that the purpose of the folder is to demonstrate the different setup mechanisms like local vs shipped compiler and the same for FSharp.Core, not to persist every benchmark executed.

Copy link
Member

@psfinaki psfinaki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great stuff Brian. Looking forward to try this in action, I will make a lot of use of it :)

As for the new benchmarks - I am a bit on the fence here. T-Gro's arguments make sense to me, in particular I think there are 2 reasons to have some specific benchmarks here:

  1. It is a good demonstration of a benchmarking technique not yet used here
  2. We plan to use them in the near future, as in, we continue working on some feature perf

I have recently started including F# micro perf to the dotnet/performance repo. We'll see how that plays out - in theory, this is the right place for keeping track of such things, and this is what C# does also. But it will take us some time to tune things out there.

Maybe you can keep one of the new benchmarks. Or all of them if you insist and nobody minds. Either way, it would be awesome if you add the README here, which can be a concatenation of the PR description and these comments, you can also take inspiration from other READMEs in the benchmarks folder.

Thank you!

@brianrourkeboll
Copy link
Contributor Author

Hmm.

(Forgive me if I am gravely misunderstanding something; I don't know enough about the intent or ownership of dotnet/performance to know what role, if any, it should play here.)

Out of curiosity, I browsed through some Roslyn PRs that introduced codegen changes to see if they had any useful practices to borrow... But they mostly seemed to just post benchmark source and results in comments, and the benchmarks themselves seemed to be independent benchmarks of what the new codegen was ostensibly doing. At a glance, I didn't see any benchmarks that actually used an older compiler's output as a baseline and benchmarked the new output against that.

@psfinaki

I have recently started including F# micro perf to the dotnet/performance repo. We'll see how that plays out - in theory, this is the right place for keeping track of such things, and this is what C# does also.

Yeah, although, on the surface, the dotnet/performance repo currently seems geared much more towards testing (1) the performance of the runtime itself and (2) the performance of specific first-party libraries, than, say, Roslyn codegen.

There are mentions of benching two different runtime versions against each other —

— which could perhaps be adapted to work with different FCS (or language, etc.) versions.

If it makes sense to start using dotnet/performance as a way to test FCS codegen changes (and perhaps library changes for FSharp.Core, similar to how many System.* and Microsoft.* libraries are tested there), then maybe we can just come up with and document two new workflows there, each for both codegen and library (FSharp.Core) changes:

  • One for long-lived benchmarks deemed worth merging into dotnet/performance, e.g., as a way to detect regressions.
  • Another for disposable benchmarks run during local development that are not checked in.

Or we could keep them separate — use dotnet/performance for long-lived benchmarks, and have some kind of documentation and maybe template(s) here, in this repo, for writing and running short-lived, disposable benchmarks during local development.

@T-Gro

I do not think we shall include all of the individual comparison instances in the repo forever - they are a useful measurement for your feature PR to demonstrate it brings improvements.

On the one hand, I can understand wanting to avoid accumulating a mess of outdated benchmarks in source: either they will never be run again, in which case they are (worse than) useless, or else they must be run at some frequency, which has development time, deployment time, and infrastructure costs.

On the other hand, it seems to me that if a performance improvement is worth making and benchmarking in the first place, it might be worth keeping the benchmarks around to detect unintended regressions, at least for sufficiently significant or fundamental things. That is again only useful if the benchmarks are actually run at some interval after changes are made, but maybe that's what dotnet/performance is for?

@psfinaki
Copy link
Member

@brianrourkeboll, so first of all - no gravely misunderstanding here, we are trying to understand our options ourselves :)

So this

If it makes sense to start using dotnet/performance as a way to test FCS codegen changes (and perhaps library changes for FSharp.Core, similar to how many System.* and Microsoft.* libraries are tested there), then maybe we can just come up with and document two new workflows there, each for both codegen and library (FSharp.Core) changes:

One for long-lived benchmarks deemed worth merging into dotnet/performance, e.g., as a way to detect regressions.
Another for disposable benchmarks run during local development that are not checked in.
Or we could keep them separate — use dotnet/performance for long-lived benchmarks, and have some kind of documentation and maybe template(s) here, in this repo, for writing and running short-lived, disposable benchmarks during local development.

... is roughly my current vision (and maybe the team's vision but not sure :D). Because this

On the other hand, it seems to me that if a performance improvement is worth making and benchmarking in the first place, it might be worth keeping the benchmarks around to detect unintended regressions, at least for sufficiently significant or fundamental things. That is again only useful if the benchmarks are actually run at some interval after changes are made, but maybe that's what dotnet/performance is for?

is supposed to be the case. dotnet\performance is indeed for tracking regressions. They run all the benchmarks all the time, automatically create baselines and automatically create issues.

However, this is just a vision, I've just started adding some F# code there and currently watching the situation. Hence I don't mind having more benchmarks in our repo until we are convinced about dotnet/performance.

@brianrourkeboll
Copy link
Contributor Author

brianrourkeboll commented Mar 25, 2024

All right. I went ahead and added a simple readme and left only one set of new benchmarks. Let me know if you'd rather I pared that down even further.

Copy link
Member

@psfinaki psfinaki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome. LGTM. Thanks!

Copy link
Member

@T-Gro T-Gro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still do not like the amount of code being checked in, especially considering it is mostly duplicate.

But I will not block it - let's check it in.
I think a benchmark could be made generic (for the various integral types here), further reducing it down. Will just need to figure out how to turn the generic argument into a category-like column.

@T-Gro T-Gro merged commit 58b4d08 into dotnet:main Mar 26, 2024
@brianrourkeboll
Copy link
Contributor Author

brianrourkeboll commented Mar 26, 2024

@T-Gro The duplication is necessary because, if the range type for [|start..finish|], etc., is not known at compile-time, then the integral range optimizations will not be applied (since the compiler does not know whether start and finish will be integral types at all), and the generated code will use the IEnumerator<_>-based RangeStepGeneric

I know because I tried to use a single generic benchmark class and use [<GenericTypeArguments(…)>], and, while it "works" (compiles and runs), it doesn't actually benchmark what it looks like it's benchmarking :). I suspect that BenchmarkDotNet is just generating subclasses that instantiate the type argument to the specified type, and then invoke the members implemented in the base class — which here will again be based on RangeStepGeneric, since the base class itself is generic and does not know whether the type will be an integral type.

image

The benchmarks for testing the compile-time computation of constant counts also must be inlined and thus duplicated per type, i.e., at least the generic ``127..-1..-128``<_>, etc., must be invoked once per type, as ``127..-1..-128``<int> (), ``127..-1..-128``<uint> (), etc.

If you don't think it's useful to have these benchmarks checked in here at all, I'm fine with removing them altogether. Maybe they would be a better fit for dotnet/performance at some point?

@brianrourkeboll brianrourkeboll deleted the compiled-code-benchmarks branch March 26, 2024 20:46
psfinaki added a commit that referenced this pull request Mar 28, 2024
* PrettyNaming: make DoesIdentifierNeedBackticks public (#16613)

* Update dependencies from https://github.com/dotnet/arcade build (#16944)

Microsoft.DotNet.Arcade.Sdk
 From Version 8.0.0-beta.24170.6 -> To Version 8.0.0-beta.24172.5

Co-authored-by: dotnet-maestro[bot] <dotnet-maestro[bot]@users.noreply.github.com>

* Unskip a few tests (#16946)

* Unskip a few tests

* another one

* Fix [<tailcall>] false positive with yield! (#16933)

* add test case showing a false positive with yield!

* add missing case in IsAppInLambdaBody that can happen with yield!

* add release notes entry

* update PR number

* only bind to what we are interested in

* add test expecting a warning for yield! in a list comprehension

* add test case using yield! in a custom CE that overflows the stack

* Add regression test for nameof type with generic parameters (#16827)

Co-authored-by: Vlad Zarytovskii <vzaritovsky@hotmail.com>

* Minor README update about proto (#16945)

* Minor README update about proto

It's Bootstrap actually.

* Update DEVGUIDE.md

Co-authored-by: Vlad Zarytovskii <vzaritovsky@hotmail.com>

* Update DEVGUIDE.md

---------

Co-authored-by: Vlad Zarytovskii <vzaritovsky@hotmail.com>

* Compiled code benchmarks: easy benchmarking of `preview` against current (#16942)

* Make it easy to bench preview against current

* Revert EnableDefaultNoneItems false (#16953)

* Exclude compiler service benchmark from VMR build when not building tests (#16955)

* Exclude compiler service benchmark from VMR build when not building tests

* remove Z

---------

Co-authored-by: Petr <psfinaki@users.noreply.github.com>

* Update dependencies from https://github.com/dotnet/arcade build 20240326.8 (#16957)

Microsoft.DotNet.Arcade.Sdk
 From Version 8.0.0-beta.24172.5 -> To Version 8.0.0-beta.24176.8

Co-authored-by: dotnet-maestro[bot] <dotnet-maestro[bot]@users.noreply.github.com>

---------

Co-authored-by: Eugene Auduchinok <eugene.auduchinok@jetbrains.com>
Co-authored-by: dotnet-maestro[bot] <42748379+dotnet-maestro[bot]@users.noreply.github.com>
Co-authored-by: dotnet-maestro[bot] <dotnet-maestro[bot]@users.noreply.github.com>
Co-authored-by: Petr <psfinaki@users.noreply.github.com>
Co-authored-by: dawe <dawedawe@posteo.de>
Co-authored-by: Florian Verdonck <florian.verdonck@outlook.com>
Co-authored-by: Vlad Zarytovskii <vzaritovsky@hotmail.com>
Co-authored-by: Brian Rourke Boll <brianrourkeboll@users.noreply.github.com>
Co-authored-by: Kevin Ransom (msft) <codecutter@hotmail.com>
Co-authored-by: Matt Mitchell <mmitche@microsoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

3 participants