Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: initial support for reinforcement learning of CSE heuristic #96880

Merged
merged 4 commits into from
Jan 29, 2024

Conversation

AndyAyersMS
Copy link
Member

@AndyAyersMS AndyAyersMS commented Jan 12, 2024

Adds special CSE heuristic modes to the JIT to support learning a good CSE
heuristic via Policy Gradient, a form of reinforcement learning. The learning
must be orchestrated by an external process, but the JIT does all of the
actual gradient computations.

The orchestration program will be added to jitutils. The overall process
also relies on SPMI and the goal is to minimize perf score.

Introduce two new CSE heuristic policies:

  • Replay: simply perform indicated sequence of CSEs
  • RL: used for the Policy Gradient, with 3 modes:
    • Stochastic: based on current parameters but allows random variation
    • Greedy: based on current parameters, deterministic
    • Update: compute updated parameters per Policy Gradient

Also rework the Random policy to be a bit more random, it now alters
both the CSEs performed and the order they are performed in.

Add the ability to have jit config options that specify sequences of ints
or doubles.

Add the ability to just dump metric info for a jitted method, and add
more details (perhaps considerably more) for CSEs. This is all still
simple text format.

Also factor out a common check for "non-viable" candidates -- these are
CSE candidates that won't actually be CSEs. This leads to some minor
diffs as the check is now slightly different for CSEs with zero uses
and/or zero weighted uses.

Contributes to #92915.

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jan 12, 2024
@ghost ghost assigned AndyAyersMS Jan 12, 2024
@ghost
Copy link

ghost commented Jan 12, 2024

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

Initial support for a reinforcement-learning based CSE heuristic.

Author: AndyAyersMS
Assignees: AndyAyersMS
Labels:

area-CodeGen-coreclr

Milestone: -

@kunalspathak
Copy link
Member

how feasible it is to have a general infrastructure that is driven by features/parameters, so any optimization can plug into it? I want to do something similar for LSRA.

Adds special CSE heuristic modes to the JIT to support learning a good CSE
heuristic via Policy Gradient, a form of reinforcement learning. The learning
must be orchestrated by an external process, but the JIT does all of the
actual gradient computations.

The orchestration program will be added to jitutils. The overall process
also relies on SPMI and the goal is to minimize perf score.

Introduce two new CSE heuristic policies:
* Replay: simply perform indicated sequence of CSEs
* RL: used for the Policy Gradient, with 3 modes:
  * Stochastic: based on current parameters but allows random variation
  * Greedy: based on current parameters, deterministic
  * Update: compute updated parameters per Policy Gradient

Also rework the Random policy to be a bit more random, it now alters
both the CSEs performed and the order they are performed in.

Add the ability to have jit config options that specify sequences of ints
or doubles.

Add the ability to just dump metric info for a jitted method, and add
more details (perhaps considerably more) for CSEs. This is all still
simple text format.

Also factor out a common check for "non-viable" candidates -- these are
CSE candidates that won't actually be CSEs. This leads to some minor
diffs as the check is now slightly different for CSEs with zero uses
and/or zero weighted uses.

Contributes to dotnet#92915.
@AndyAyersMS AndyAyersMS changed the title Cse metrics JIT: initial support for reinforcement learning of CSE heuristic Jan 22, 2024
@AndyAyersMS AndyAyersMS marked this pull request as ready for review January 22, 2024 19:46
@AndyAyersMS
Copy link
Member Author

@dotnet/jit-contrib FYI

Not sure who wants to review this one. Any volunteers?

@AndyAyersMS
Copy link
Member Author

how feasible it is to have a general infrastructure that is driven by features/parameters, so any optimization can plug into it? I want to do something similar for LSRA.

Somewhat? The basic structure is common to lots of problems, the tricky bit is figuring out the right state/action model and to either handle this across a jit/host/orchestrator boundary or externalize all the info from the jit so it can be processed entirely by outside code.

Let me describe briefly how this all works and maybe we can brainstorm about how to leverage it for your case.

The "RL" mode for CSEs has 3 behaviors:

  • Evaluation/Exploration -- an external agent supplies a random seed, step size, and parameters (a vector of numbers) to the jit via config and drives the jit on a method. The jit uses a stochastic soft-max policy to produce a particular sequence of CSEs, and, at the end of jitting, a writes metrics including perf score plus a description of what CSEs were done.
  • Update -- an external agent supplies parameters and per-step rewards (estimated changes in perf score for each sub-sequence of CSEs). The jit uses the PolicyGradient algorithm to compute updates to the parameters, and writes those out via metrics.
  • Greedy -- an external agent supplies parameters and the JIT runs a greedy policy (always chose best option) using those.

The orchestration process repeatedly cycles through evaluation/exploration + update steps. This process should converge to a set of parameters that (via greedy policy) should obtain the optimal perf score for that method (or scores for sets of methods).

In the background the orchestrator also computes "V" and "Q" estimates using the data from each run; this is used to compute increasingly accurate per-step rewards.

@kunalspathak
Copy link
Member

Diff results for #96880

Assembly diffs

Assembly diffs for linux/arm64 ran on windows/x64

Diffs are based on 2,501,661 contexts (1,003,806 MinOpts, 1,497,855 FullOpts).

MISSED contexts: base: 3,546 (0.14%), diff: 3,556 (0.14%)

Overall (+1,884 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run_pgo.linux.arm64.checked.mch 80,912,504 -64
coreclr_tests.run.linux.arm64.checked.mch 509,947,608 +260
libraries.pmi.linux.arm64.checked.mch 75,989,468 +168
libraries_tests.run.linux.arm64.Release.mch 381,322,764 +1,596
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch 162,528,980 -76
FullOpts (+1,884 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run_pgo.linux.arm64.checked.mch 55,976,044 -64
coreclr_tests.run.linux.arm64.checked.mch 160,722,552 +260
libraries.pmi.linux.arm64.checked.mch 75,869,484 +168
libraries_tests.run.linux.arm64.Release.mch 166,025,624 +1,596
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch 149,047,768 -76

Assembly diffs for linux/x64 ran on windows/x64

Diffs are based on 2,595,039 contexts (1,052,329 MinOpts, 1,542,710 FullOpts).

MISSED contexts: 3,596 (0.14%)

Overall (-236 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run_pgo.linux.x64.checked.mch 66,799,627 +62
coreclr_tests.run.linux.x64.checked.mch 458,880,954 +193
libraries.pmi.linux.x64.checked.mch 59,972,991 +90
libraries_tests.run.linux.x64.Release.mch 329,977,293 -591
libraries_tests_no_tiered_compilation.run.linux.x64.Release.mch 130,000,373 +10
FullOpts (-236 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run_pgo.linux.x64.checked.mch 46,969,870 +62
coreclr_tests.run.linux.x64.checked.mch 132,322,819 +193
libraries.pmi.linux.x64.checked.mch 59,860,121 +90
libraries_tests.run.linux.x64.Release.mch 145,587,772 -591
libraries_tests_no_tiered_compilation.run.linux.x64.Release.mch 119,341,902 +10

Assembly diffs for osx/arm64 ran on windows/x64

Diffs are based on 2,263,032 contexts (930,876 MinOpts, 1,332,156 FullOpts).

MISSED contexts: base: 2,925 (0.13%), diff: 2,933 (0.13%)

Overall (+1,512 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run_pgo.osx.arm64.checked.mch 34,569,912 -208
coreclr_tests.run.osx.arm64.checked.mch 485,471,332 +228
libraries.pmi.osx.arm64.checked.mch 79,954,016 +112
libraries_tests.run.osx.arm64.Release.mch 312,684,004 +1,452
libraries_tests_no_tiered_compilation.run.osx.arm64.Release.mch 160,787,844 -72
FullOpts (+1,512 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run_pgo.osx.arm64.checked.mch 18,096,632 -208
coreclr_tests.run.osx.arm64.checked.mch 153,164,876 +228
libraries.pmi.osx.arm64.checked.mch 79,832,888 +112
libraries_tests.run.osx.arm64.Release.mch 108,743,500 +1,452
libraries_tests_no_tiered_compilation.run.osx.arm64.Release.mch 147,650,316 -72

Assembly diffs for windows/arm64 ran on windows/x64

Diffs are based on 2,318,296 contexts (931,543 MinOpts, 1,386,753 FullOpts).

MISSED contexts: base: 2,587 (0.11%), diff: 2,598 (0.11%)

Overall (-952 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run_pgo.windows.arm64.checked.mch 47,215,296 +104
coreclr_tests.run.windows.arm64.checked.mch 495,322,244 -296
libraries.pmi.windows.arm64.checked.mch 79,562,252 +104
libraries_tests.run.windows.arm64.Release.mch 309,737,828 -728
libraries_tests_no_tiered_compilation.run.windows.arm64.Release.mch 169,000,724 -136
FullOpts (-952 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run_pgo.windows.arm64.checked.mch 30,964,912 +104
coreclr_tests.run.windows.arm64.checked.mch 156,230,716 -296
libraries.pmi.windows.arm64.checked.mch 79,442,268 +104
libraries_tests.run.windows.arm64.Release.mch 108,156,324 -728
libraries_tests_no_tiered_compilation.run.windows.arm64.Release.mch 155,863,260 -136

Assembly diffs for windows/x64 ran on windows/x64

Diffs are based on 2,492,949 contexts (983,689 MinOpts, 1,509,260 FullOpts).

MISSED contexts: base: 3,859 (0.15%), diff: 3,862 (0.15%)

Overall (-2,082 bytes)
Collection Base size (bytes) Diff size (bytes)
aspnet.run.windows.x64.checked.mch 41,760,738 -1,310
benchmarks.run_pgo.windows.x64.checked.mch 34,741,684 +112
benchmarks.run_tiered.windows.x64.checked.mch 12,662,284 -10
coreclr_tests.run.windows.x64.checked.mch 392,866,349 +302
libraries.pmi.windows.x64.checked.mch 61,196,926 +84
libraries_tests.run.windows.x64.Release.mch 279,034,736 -1,249
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch 133,435,993 -6
realworld.run.windows.x64.checked.mch 14,170,685 -5
FullOpts (-2,082 bytes)
Collection Base size (bytes) Diff size (bytes)
aspnet.run.windows.x64.checked.mch 27,102,013 -1,310
benchmarks.run_pgo.windows.x64.checked.mch 20,506,707 +112
benchmarks.run_tiered.windows.x64.checked.mch 3,477,018 -10
coreclr_tests.run.windows.x64.checked.mch 119,323,357 +302
libraries.pmi.windows.x64.checked.mch 61,083,407 +84
libraries_tests.run.windows.x64.Release.mch 100,666,420 -1,249
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch 123,012,623 -6
realworld.run.windows.x64.checked.mch 13,780,980 -5

Details here


Assembly diffs for linux/arm ran on windows/x86

Diffs are based on 2,238,212 contexts (827,812 MinOpts, 1,410,400 FullOpts).

MISSED contexts: base: 74,052 (3.20%), diff: 74,066 (3.20%)

Overall (-2,444 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run_pgo.linux.arm.checked.mch 60,232,784 +188
coreclr_tests.run.linux.arm.checked.mch 321,775,434 +454
libraries.pmi.linux.arm.checked.mch 49,549,380 +86
libraries_tests.run.linux.arm.Release.mch 241,718,592 -3,166
libraries_tests_no_tiered_compilation.run.linux.arm.Release.mch 93,040,870 -6
FullOpts (-2,444 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run_pgo.linux.arm.checked.mch 49,435,182 +188
coreclr_tests.run.linux.arm.checked.mch 109,045,300 +454
libraries.pmi.linux.arm.checked.mch 49,442,876 +86
libraries_tests.run.linux.arm.Release.mch 119,715,648 -3,166
libraries_tests_no_tiered_compilation.run.linux.arm.Release.mch 82,957,050 -6

Assembly diffs for windows/x86 ran on windows/x86

Diffs are based on 2,299,277 contexts (841,817 MinOpts, 1,457,460 FullOpts).

MISSED contexts: base: 2,090 (0.09%), diff: 2,093 (0.09%)

Overall (-81 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run_pgo.windows.x86.checked.mch 43,746,356 -45
coreclr_tests.run.windows.x86.checked.mch 308,815,477 +159
libraries_tests.run.windows.x86.Release.mch 186,076,300 -175
libraries_tests_no_tiered_compilation.run.windows.x86.Release.mch 102,171,676 -20
FullOpts (-81 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run_pgo.windows.x86.checked.mch 37,116,866 -45
coreclr_tests.run.windows.x86.checked.mch 107,143,708 +159
libraries_tests.run.windows.x86.Release.mch 87,744,793 -175
libraries_tests_no_tiered_compilation.run.windows.x86.Release.mch 93,501,884 -20

Details here


Throughput diffs

Throughput diffs for linux/arm64 ran on windows/x64

FullOpts (-0.01% to +0.00%)
Collection PDIFF
libraries_tests.run.linux.arm64.Release.mch -0.01%

Throughput diffs for windows/x64 ran on windows/x64

Overall (-0.01% to +0.00%)
Collection PDIFF
aspnet.run.windows.x64.checked.mch -0.01%
FullOpts (-0.01% to +0.00%)
Collection PDIFF
aspnet.run.windows.x64.checked.mch -0.01%

Details here


Throughput diffs for linux/arm64 ran on linux/x64

FullOpts (-0.01% to -0.00%)
Collection PDIFF
libraries_tests.run.linux.arm64.Release.mch -0.01%

Details here


@AndyAyersMS
Copy link
Member Author

@EgorBo can you take a look?

@EgorBo
Copy link
Member

EgorBo commented Jan 23, 2024

@EgorBo can you take a look?

Sure, need to rewatch your internal talk that I missed first 🙂

@ryujit-bot
Copy link

Diff results for #96880

Throughput diffs

Throughput diffs for linux/arm64 ran on windows/x64

FullOpts (-0.01% to +0.00%)
Collection PDIFF
libraries_tests.run.linux.arm64.Release.mch -0.01%

Throughput diffs for windows/x64 ran on windows/x64

Overall (-0.01% to +0.00%)
Collection PDIFF
aspnet.run.windows.x64.checked.mch -0.01%
FullOpts (-0.01% to +0.00%)
Collection PDIFF
aspnet.run.windows.x64.checked.mch -0.01%

Details here


@ryujit-bot
Copy link

Diff results for #96880

Assembly diffs

Assembly diffs for linux/arm64 ran on windows/x64

Diffs are based on 2,501,147 contexts (1,003,806 MinOpts, 1,497,341 FullOpts).

MISSED contexts: base: 4,060 (0.16%), diff: 4,070 (0.16%)

Overall (+3,884 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run_pgo.linux.arm64.checked.mch 81,131,652 -88
coreclr_tests.run.linux.arm64.checked.mch 509,821,816 +240
libraries.pmi.linux.arm64.checked.mch 76,017,172 +168
libraries_tests.run.linux.arm64.Release.mch 381,444,832 +3,672
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch 162,653,504 -108
FullOpts (+3,884 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run_pgo.linux.arm64.checked.mch 56,195,192 -88
coreclr_tests.run.linux.arm64.checked.mch 160,596,760 +240
libraries.pmi.linux.arm64.checked.mch 75,897,188 +168
libraries_tests.run.linux.arm64.Release.mch 166,147,692 +3,672
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch 149,172,292 -108

Assembly diffs for linux/x64 ran on windows/x64

Diffs are based on 2,595,007 contexts (1,052,329 MinOpts, 1,542,678 FullOpts).

MISSED contexts: 3,628 (0.14%)

Overall (-365 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run_pgo.linux.x64.checked.mch 68,635,056 +62
coreclr_tests.run.linux.x64.checked.mch 459,551,078 +224
libraries.pmi.linux.x64.checked.mch 60,144,132 +90
libraries_tests.run.linux.x64.Release.mch 333,558,929 -751
libraries_tests_no_tiered_compilation.run.linux.x64.Release.mch 130,468,363 +10
FullOpts (-365 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run_pgo.linux.x64.checked.mch 48,805,299 +62
coreclr_tests.run.linux.x64.checked.mch 132,992,943 +224
libraries.pmi.linux.x64.checked.mch 60,031,262 +90
libraries_tests.run.linux.x64.Release.mch 149,169,408 -751
libraries_tests_no_tiered_compilation.run.linux.x64.Release.mch 119,809,892 +10

Assembly diffs for osx/arm64 ran on windows/x64

Diffs are based on 2,262,701 contexts (930,876 MinOpts, 1,331,825 FullOpts).

MISSED contexts: base: 3,256 (0.14%), diff: 3,264 (0.14%)

Overall (+2,152 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run_pgo.osx.arm64.checked.mch 34,667,904 -216
coreclr_tests.run.osx.arm64.checked.mch 485,378,220 +224
libraries.pmi.osx.arm64.checked.mch 79,949,748 +112
libraries_tests.run.osx.arm64.Release.mch 312,903,680 +2,104
libraries_tests_no_tiered_compilation.run.osx.arm64.Release.mch 160,908,056 -72
FullOpts (+2,152 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run_pgo.osx.arm64.checked.mch 18,194,624 -216
coreclr_tests.run.osx.arm64.checked.mch 153,071,764 +224
libraries.pmi.osx.arm64.checked.mch 79,828,620 +112
libraries_tests.run.osx.arm64.Release.mch 108,963,176 +2,104
libraries_tests_no_tiered_compilation.run.osx.arm64.Release.mch 147,770,528 -72

Assembly diffs for windows/arm64 ran on windows/x64

Diffs are based on 2,318,196 contexts (931,543 MinOpts, 1,386,653 FullOpts).

MISSED contexts: base: 2,687 (0.12%), diff: 2,698 (0.12%)

Overall (-216 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run_pgo.windows.arm64.checked.mch 47,390,952 +76
coreclr_tests.run.windows.arm64.checked.mch 495,369,076 -312
libraries.pmi.windows.arm64.checked.mch 79,588,924 +104
libraries_tests.run.windows.arm64.Release.mch 310,509,936 +52
libraries_tests_no_tiered_compilation.run.windows.arm64.Release.mch 169,130,064 -136
FullOpts (-216 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run_pgo.windows.arm64.checked.mch 31,140,568 +76
coreclr_tests.run.windows.arm64.checked.mch 156,277,548 -312
libraries.pmi.windows.arm64.checked.mch 79,468,940 +104
libraries_tests.run.windows.arm64.Release.mch 108,928,432 +52
libraries_tests_no_tiered_compilation.run.windows.arm64.Release.mch 155,992,600 -136

Assembly diffs for windows/x64 ran on windows/x64

Diffs are based on 2,492,909 contexts (983,689 MinOpts, 1,509,220 FullOpts).

MISSED contexts: base: 3,899 (0.16%), diff: 3,902 (0.16%)

Overall (-2,169 bytes)
Collection Base size (bytes) Diff size (bytes)
aspnet.run.windows.x64.checked.mch 42,176,983 -1,181
benchmarks.run_pgo.windows.x64.checked.mch 35,391,293 +101
benchmarks.run_tiered.windows.x64.checked.mch 12,661,498 -10
coreclr_tests.run.windows.x64.checked.mch 393,404,923 +118
libraries.pmi.windows.x64.checked.mch 61,389,190 +84
libraries_tests.run.windows.x64.Release.mch 281,642,309 -1,270
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch 133,913,806 -6
realworld.run.windows.x64.checked.mch 14,170,687 -5
FullOpts (-2,169 bytes)
Collection Base size (bytes) Diff size (bytes)
aspnet.run.windows.x64.checked.mch 27,518,258 -1,181
benchmarks.run_pgo.windows.x64.checked.mch 21,156,316 +101
benchmarks.run_tiered.windows.x64.checked.mch 3,476,232 -10
coreclr_tests.run.windows.x64.checked.mch 119,861,931 +118
libraries.pmi.windows.x64.checked.mch 61,275,671 +84
libraries_tests.run.windows.x64.Release.mch 103,273,993 -1,270
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch 123,490,436 -6
realworld.run.windows.x64.checked.mch 13,780,982 -5

Details here


Assembly diffs for linux/arm ran on windows/x86

Diffs are based on 2,237,676 contexts (827,812 MinOpts, 1,409,864 FullOpts).

MISSED contexts: base: 74,588 (3.23%), diff: 74,602 (3.23%)

Overall (-2,188 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run_pgo.linux.arm.checked.mch 61,255,640 +192
coreclr_tests.run.linux.arm.checked.mch 321,788,912 +404
libraries.pmi.linux.arm.checked.mch 49,610,860 +86
libraries_tests.run.linux.arm.Release.mch 242,758,250 -2,864
libraries_tests_no_tiered_compilation.run.linux.arm.Release.mch 93,199,432 -6
FullOpts (-2,188 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run_pgo.linux.arm.checked.mch 50,458,038 +192
coreclr_tests.run.linux.arm.checked.mch 109,058,778 +404
libraries.pmi.linux.arm.checked.mch 49,504,356 +86
libraries_tests.run.linux.arm.Release.mch 120,755,306 -2,864
libraries_tests_no_tiered_compilation.run.linux.arm.Release.mch 83,115,612 -6

Assembly diffs for windows/x86 ran on windows/x86

Diffs are based on 2,296,274 contexts (841,817 MinOpts, 1,454,457 FullOpts).

MISSED contexts: base: 5,093 (0.22%), diff: 5,096 (0.22%)

Overall (-79 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run_pgo.windows.x86.checked.mch 45,222,512 -61
coreclr_tests.run.windows.x86.checked.mch 309,180,492 +163
libraries_tests.run.windows.x86.Release.mch 185,842,234 -161
libraries_tests_no_tiered_compilation.run.windows.x86.Release.mch 102,197,516 -20
FullOpts (-79 bytes)
Collection Base size (bytes) Diff size (bytes)
benchmarks.run_pgo.windows.x86.checked.mch 38,593,022 -61
coreclr_tests.run.windows.x86.checked.mch 107,508,723 +163
libraries_tests.run.windows.x86.Release.mch 87,510,727 -161
libraries_tests_no_tiered_compilation.run.windows.x86.Release.mch 93,527,724 -20

Details here


Throughput diffs

Throughput diffs for linux/arm64 ran on windows/x64

FullOpts (-0.01% to +0.00%)
Collection PDIFF
libraries_tests.run.linux.arm64.Release.mch -0.01%

Throughput diffs for windows/x64 ran on windows/x64

Overall (-0.01% to +0.00%)
Collection PDIFF
aspnet.run.windows.x64.checked.mch -0.01%
FullOpts (-0.01% to +0.00%)
Collection PDIFF
aspnet.run.windows.x64.checked.mch -0.01%

Details here


Throughput diffs for linux/arm64 ran on linux/x64

FullOpts (-0.01% to -0.00%)
Collection PDIFF
libraries_tests.run.linux.arm64.Release.mch -0.01%

Details here


AndyAyersMS added a commit to AndyAyersMS/jitutils that referenced this pull request Jan 25, 2024
Add a tool that can use ML techniques to explore the JIT's CSE heuristic.
Some parts of this are very specific to CSEs, others are general and could
be repurposed for use with other heuristics.

This is still work in progress.

Depends on jit changes in dotnet/runtime#96880
@EgorBo EgorBo self-requested a review January 25, 2024 21:28
AndyAyersMS added a commit to dotnet/jitutils that referenced this pull request Jan 27, 2024
Add a tool that can use ML techniques to explore the JIT's CSE
heuristic. Some parts of this are very specific to CSEs, others are
general and could be repurposed for use with other heuristics.

This is still work in progress.

Depends on jit changes in dotnet/runtime#96880
@AndyAyersMS
Copy link
Member Author

@EgorBo ping

printf("\n");
}

printf("Total bytes of code %d, prolog size %d, PerfScore %.2f, instruction count %d, allocated bytes for "
Copy link
Member

@EgorBo EgorBo Jan 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: looks like "Total bytes of code" is no longer prefixed with ; (comments in asm)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will fix in a subsequent change.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added that back in #97677

// 10. cse costEx is <= MIN_CSE_COST (0/1)
// 11. cse is a constant and live across call (0/1)
// 12. cse is a constant and min cost (0/1)
// 13. cse is a constant and NOT min cost (0/1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wondering - are you going to take platform's features into account such as number of callee-saved regs (for GPR and floats)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we will need to add something like this -- right now the mechanisms to decide not to do a CSE are too weak.

I have follow-on changes that add some, but I'm not happy with them yet.

Copy link
Member

@EgorBo EgorBo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, looking forward to seeing the actual changes! Sorry for the delayed review

@AndyAyersMS AndyAyersMS merged commit 8a0b3f3 into dotnet:main Jan 29, 2024
129 checks passed
@github-actions github-actions bot locked and limited conversation to collaborators Feb 29, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants