Skip to content

JIT: temporarily enable RLCSEGreedy to see how it fares in CI #98776

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

AndyAyersMS
Copy link
Member

Am going to use this as a place to capture observations on the evolved heuristic.

I have done a bit of analysis locally already and will add those notes here. One of the key challenges will be figuring out how to try and fix the various problems without losing the benefits.

@AndyAyersMS
Copy link
Member Author

Ah, there's a release mode issue to sort out... one second.

@ryujit-bot
Copy link

Diff results for #98776

Throughput diffs

Throughput diffs for linux/arm64 ran on linux/x64

FullOpts (+0.01%)
Collection PDIFF
benchmarks.run_pgo.linux.arm64.checked.mch +0.01%
benchmarks.run_tiered.linux.arm64.checked.mch +0.01%

Throughput diffs for linux/x64 ran on linux/x64

Overall (+0.00% to +0.01%)
Collection PDIFF
libraries.crossgen2.linux.x64.checked.mch +0.01%
FullOpts (+0.00% to +0.01%)
Collection PDIFF
benchmarks.run_tiered.linux.x64.checked.mch +0.01%
libraries.crossgen2.linux.x64.checked.mch +0.01%

Details here


Throughput diffs for linux/arm64 ran on windows/x64

Overall (+0.00% to +0.03%)
Collection PDIFF
benchmarks.run.linux.arm64.checked.mch +0.01%
libraries.crossgen2.linux.arm64.checked.mch +0.03%
libraries.pmi.linux.arm64.checked.mch +0.02%
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch +0.02%
realworld.run.linux.arm64.checked.mch +0.01%
smoke_tests.nativeaot.linux.arm64.checked.mch +0.02%
FullOpts (+0.01% to +0.03%)
Collection PDIFF
benchmarks.run.linux.arm64.checked.mch +0.01%
benchmarks.run_pgo.linux.arm64.checked.mch +0.02%
benchmarks.run_tiered.linux.arm64.checked.mch +0.02%
coreclr_tests.run.linux.arm64.checked.mch +0.02%
libraries.crossgen2.linux.arm64.checked.mch +0.03%
libraries.pmi.linux.arm64.checked.mch +0.02%
libraries_tests.run.linux.arm64.Release.mch +0.02%
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch +0.02%
realworld.run.linux.arm64.checked.mch +0.01%
smoke_tests.nativeaot.linux.arm64.checked.mch +0.02%

Throughput diffs for linux/x64 ran on windows/x64

Overall (+0.00% to +0.03%)
Collection PDIFF
benchmarks.run.linux.x64.checked.mch +0.01%
libraries.crossgen2.linux.x64.checked.mch +0.03%
libraries.pmi.linux.x64.checked.mch +0.03%
libraries_tests_no_tiered_compilation.run.linux.x64.Release.mch +0.01%
realworld.run.linux.x64.checked.mch +0.01%
smoke_tests.nativeaot.linux.x64.checked.mch +0.01%
FullOpts (+0.01% to +0.03%)
Collection PDIFF
benchmarks.run.linux.x64.checked.mch +0.01%
benchmarks.run_pgo.linux.x64.checked.mch +0.02%
benchmarks.run_tiered.linux.x64.checked.mch +0.02%
coreclr_tests.run.linux.x64.checked.mch +0.01%
libraries.crossgen2.linux.x64.checked.mch +0.03%
libraries.pmi.linux.x64.checked.mch +0.03%
libraries_tests.run.linux.x64.Release.mch +0.01%
libraries_tests_no_tiered_compilation.run.linux.x64.Release.mch +0.01%
realworld.run.linux.x64.checked.mch +0.01%
smoke_tests.nativeaot.linux.x64.checked.mch +0.01%

Throughput diffs for osx/arm64 ran on windows/x64

Overall (+0.00% to +0.03%)
Collection PDIFF
benchmarks.run.osx.arm64.checked.mch +0.01%
libraries.crossgen2.osx.arm64.checked.mch +0.03%
libraries.pmi.osx.arm64.checked.mch +0.01%
libraries_tests_no_tiered_compilation.run.osx.arm64.Release.mch +0.02%
realworld.run.osx.arm64.checked.mch +0.01%
FullOpts (+0.01% to +0.03%)
Collection PDIFF
benchmarks.run.osx.arm64.checked.mch +0.01%
benchmarks.run_pgo.osx.arm64.checked.mch +0.02%
benchmarks.run_tiered.osx.arm64.checked.mch +0.02%
coreclr_tests.run.osx.arm64.checked.mch +0.02%
libraries.crossgen2.osx.arm64.checked.mch +0.03%
libraries.pmi.osx.arm64.checked.mch +0.01%
libraries_tests.run.osx.arm64.Release.mch +0.01%
libraries_tests_no_tiered_compilation.run.osx.arm64.Release.mch +0.02%
realworld.run.osx.arm64.checked.mch +0.01%

Throughput diffs for windows/arm64 ran on windows/x64

Overall (+0.00% to +0.02%)
Collection PDIFF
benchmarks.run.windows.arm64.checked.mch +0.01%
libraries.crossgen2.windows.arm64.checked.mch +0.02%
libraries.pmi.windows.arm64.checked.mch +0.02%
libraries_tests_no_tiered_compilation.run.windows.arm64.Release.mch +0.02%
realworld.run.windows.arm64.checked.mch +0.01%
smoke_tests.nativeaot.windows.arm64.checked.mch +0.02%
FullOpts (+0.01% to +0.02%)
Collection PDIFF
benchmarks.run.windows.arm64.checked.mch +0.01%
benchmarks.run_pgo.windows.arm64.checked.mch +0.02%
benchmarks.run_tiered.windows.arm64.checked.mch +0.02%
coreclr_tests.run.windows.arm64.checked.mch +0.02%
libraries.crossgen2.windows.arm64.checked.mch +0.02%
libraries.pmi.windows.arm64.checked.mch +0.02%
libraries_tests.run.windows.arm64.Release.mch +0.02%
libraries_tests_no_tiered_compilation.run.windows.arm64.Release.mch +0.02%
realworld.run.windows.arm64.checked.mch +0.01%
smoke_tests.nativeaot.windows.arm64.checked.mch +0.02%

Throughput diffs for windows/x64 ran on windows/x64

Overall (+0.00% to +0.03%)
Collection PDIFF
benchmarks.run.windows.x64.checked.mch +0.01%
libraries.crossgen2.windows.x64.checked.mch +0.03%
libraries.pmi.windows.x64.checked.mch +0.03%
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch +0.01%
realworld.run.windows.x64.checked.mch +0.01%
smoke_tests.nativeaot.windows.x64.checked.mch +0.01%
FullOpts (+0.01% to +0.03%)
Collection PDIFF
benchmarks.run.windows.x64.checked.mch +0.01%
benchmarks.run_pgo.windows.x64.checked.mch +0.02%
benchmarks.run_tiered.windows.x64.checked.mch +0.02%
coreclr_tests.run.windows.x64.checked.mch +0.01%
libraries.crossgen2.windows.x64.checked.mch +0.03%
libraries.pmi.windows.x64.checked.mch +0.03%
libraries_tests.run.windows.x64.Release.mch +0.01%
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch +0.01%
realworld.run.windows.x64.checked.mch +0.01%
smoke_tests.nativeaot.windows.x64.checked.mch +0.01%

Details here


Throughput diffs for linux/arm ran on windows/x86

Overall (+0.00% to +0.02%)
Collection PDIFF
benchmarks.run.linux.arm.checked.mch +0.01%
benchmarks.run_tiered.linux.arm.checked.mch +0.01%
libraries.crossgen2.linux.arm.checked.mch +0.02%
libraries.pmi.linux.arm.checked.mch +0.02%
libraries_tests_no_tiered_compilation.run.linux.arm.Release.mch +0.01%
realworld.run.linux.arm.checked.mch +0.01%
FullOpts (+0.01% to +0.02%)
Collection PDIFF
benchmarks.run.linux.arm.checked.mch +0.01%
benchmarks.run_pgo.linux.arm.checked.mch +0.02%
benchmarks.run_tiered.linux.arm.checked.mch +0.02%
coreclr_tests.run.linux.arm.checked.mch +0.01%
libraries.crossgen2.linux.arm.checked.mch +0.02%
libraries.pmi.linux.arm.checked.mch +0.02%
libraries_tests.run.linux.arm.Release.mch +0.01%
libraries_tests_no_tiered_compilation.run.linux.arm.Release.mch +0.01%
realworld.run.linux.arm.checked.mch +0.01%

Throughput diffs for windows/x86 ran on windows/x86

Overall (+0.00% to +0.02%)
Collection PDIFF
benchmarks.run.windows.x86.checked.mch +0.01%
libraries.crossgen2.windows.x86.checked.mch +0.02%
libraries.pmi.windows.x86.checked.mch +0.02%
libraries_tests_no_tiered_compilation.run.windows.x86.Release.mch +0.01%
realworld.run.windows.x86.checked.mch +0.01%
FullOpts (+0.01% to +0.02%)
Collection PDIFF
benchmarks.run.windows.x86.checked.mch +0.01%
benchmarks.run_pgo.windows.x86.checked.mch +0.01%
benchmarks.run_tiered.windows.x86.checked.mch +0.01%
coreclr_tests.run.windows.x86.checked.mch +0.01%
libraries.crossgen2.windows.x86.checked.mch +0.02%
libraries.pmi.windows.x86.checked.mch +0.02%
libraries_tests.run.windows.x86.Release.mch +0.01%
libraries_tests_no_tiered_compilation.run.windows.x86.Release.mch +0.01%
realworld.run.windows.x86.checked.mch +0.01%

Details here


@ryujit-bot
Copy link

Diff results for #98776

Assembly diffs

Assembly diffs for linux/arm64 ran on windows/x64

Diffs are based on 2,549,519 contexts (1,019,526 MinOpts, 1,529,993 FullOpts).

MISSED contexts: base: 172 (0.01%), diff: 5,238 (0.21%)

Overall (-1,000,028 bytes)
Collection Base size (bytes) Diff size (bytes) PerfScore in Diffs
benchmarks.run.linux.arm64.checked.mch 18,177,936 +73,684 +0.47%
benchmarks.run_pgo.linux.arm64.checked.mch 76,659,216 -68,772 -1.81%
benchmarks.run_tiered.linux.arm64.checked.mch 22,368,864 -3,280 -0.15%
coreclr_tests.run.linux.arm64.checked.mch 521,932,512 -1,452,504 -0.32%
libraries.crossgen2.linux.arm64.checked.mch 64,066,684 -189,792 +0.63%
libraries.pmi.linux.arm64.checked.mch 76,561,528 +46,740 -0.50%
libraries_tests.run.linux.arm64.Release.mch 379,466,540 +65,236 -0.25%
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch 161,433,920 +527,392 +0.01%
realworld.run.linux.arm64.checked.mch 15,705,696 +7,800 -0.16%
smoke_tests.nativeaot.linux.arm64.checked.mch 2,971,464 -6,532 -0.04%
FullOpts (-1,000,028 bytes)
Collection Base size (bytes) Diff size (bytes) PerfScore in Diffs
benchmarks.run.linux.arm64.checked.mch 17,684,900 +73,684 +0.47%
benchmarks.run_pgo.linux.arm64.checked.mch 53,524,948 -68,772 -1.81%
benchmarks.run_tiered.linux.arm64.checked.mch 4,728,640 -3,280 -0.15%
coreclr_tests.run.linux.arm64.checked.mch 164,343,324 -1,452,504 -0.32%
libraries.crossgen2.linux.arm64.checked.mch 64,065,048 -189,792 +0.63%
libraries.pmi.linux.arm64.checked.mch 76,441,544 +46,740 -0.50%
libraries_tests.run.linux.arm64.Release.mch 163,929,444 +65,236 -0.25%
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch 148,030,812 +527,392 +0.01%
realworld.run.linux.arm64.checked.mch 15,141,388 +7,800 -0.16%
smoke_tests.nativeaot.linux.arm64.checked.mch 2,970,476 -6,532 -0.04%

Assembly diffs for linux/x64 ran on windows/x64

Diffs are based on 2,542,303 contexts (988,245 MinOpts, 1,554,058 FullOpts).

MISSED contexts: base: 177 (0.01%), diff: 1,098 (0.04%)

Overall (+2,009,297 bytes)
Collection Base size (bytes) Diff size (bytes) PerfScore in Diffs
benchmarks.run.linux.x64.checked.mch 12,075,262 +5,093 -2.47%
benchmarks.run_pgo.linux.x64.checked.mch 67,899,843 +15,032 -1.75%
benchmarks.run_tiered.linux.x64.checked.mch 20,398,221 -14,279 -3.42%
coreclr_tests.run.linux.x64.checked.mch 414,872,683 +518,690 -1.74%
libraries.crossgen2.linux.x64.checked.mch 44,737,894 +45,288 -0.49%
libraries.pmi.linux.x64.checked.mch 60,901,638 +160,898 -1.49%
libraries_tests.run.linux.x64.Release.mch 328,378,411 +946,848 -1.16%
libraries_tests_no_tiered_compilation.run.linux.x64.Release.mch 132,100,449 +286,308 -1.92%
realworld.run.linux.x64.checked.mch 13,158,743 +33,481 -1.10%
smoke_tests.nativeaot.linux.x64.checked.mch 4,240,043 +11,938 -1.03%
FullOpts (+2,009,297 bytes)
Collection Base size (bytes) Diff size (bytes) PerfScore in Diffs
benchmarks.run.linux.x64.checked.mch 11,901,522 +5,093 -2.47%
benchmarks.run_pgo.linux.x64.checked.mch 48,116,284 +15,032 -1.75%
benchmarks.run_tiered.linux.x64.checked.mch 3,685,943 -14,279 -3.42%
coreclr_tests.run.linux.x64.checked.mch 126,925,785 +518,690 -1.74%
libraries.crossgen2.linux.x64.checked.mch 44,736,696 +45,288 -0.49%
libraries.pmi.linux.x64.checked.mch 60,788,808 +160,898 -1.49%
libraries_tests.run.linux.x64.Release.mch 145,576,425 +946,848 -1.16%
libraries_tests_no_tiered_compilation.run.linux.x64.Release.mch 121,528,353 +286,308 -1.92%
realworld.run.linux.x64.checked.mch 12,773,258 +33,481 -1.10%
smoke_tests.nativeaot.linux.x64.checked.mch 4,239,094 +11,938 -1.03%

Assembly diffs for osx/arm64 ran on windows/x64

Diffs are based on 2,312,793 contexts (945,402 MinOpts, 1,367,391 FullOpts).

MISSED contexts: base: 170 (0.01%), diff: 4,920 (0.21%)

Overall (-984,144 bytes)
Collection Base size (bytes) Diff size (bytes) PerfScore in Diffs
benchmarks.run.osx.arm64.checked.mch 11,082,424 +5,184 -0.04%
benchmarks.run_pgo.osx.arm64.checked.mch 34,604,740 +8,120 -0.24%
benchmarks.run_tiered.osx.arm64.checked.mch 15,625,516 -4,276 -0.11%
coreclr_tests.run.osx.arm64.checked.mch 506,293,844 -1,298,788 -0.34%
libraries.crossgen2.osx.arm64.checked.mch 63,948,516 -192,940 +0.61%
libraries.pmi.osx.arm64.checked.mch 80,552,460 +60,928 -0.41%
libraries_tests.run.osx.arm64.Release.mch 311,987,272 -2,908 -0.31%
libraries_tests_no_tiered_compilation.run.osx.arm64.Release.mch 159,439,412 +437,936 -0.01%
realworld.run.osx.arm64.checked.mch 15,016,556 +2,600 -0.23%
FullOpts (-984,144 bytes)
Collection Base size (bytes) Diff size (bytes) PerfScore in Diffs
benchmarks.run.osx.arm64.checked.mch 11,081,888 +5,184 -0.04%
benchmarks.run_pgo.osx.arm64.checked.mch 18,111,384 +8,120 -0.24%
benchmarks.run_tiered.osx.arm64.checked.mch 3,993,192 -4,276 -0.11%
coreclr_tests.run.osx.arm64.checked.mch 154,849,448 -1,298,788 -0.34%
libraries.crossgen2.osx.arm64.checked.mch 63,946,888 -192,940 +0.61%
libraries.pmi.osx.arm64.checked.mch 80,431,396 +60,928 -0.41%
libraries_tests.run.osx.arm64.Release.mch 110,585,536 -2,908 -0.31%
libraries_tests_no_tiered_compilation.run.osx.arm64.Release.mch 146,391,192 +437,936 -0.01%
realworld.run.osx.arm64.checked.mch 14,459,656 +2,600 -0.23%

Assembly diffs for windows/arm64 ran on windows/x64

Diffs are based on 2,397,782 contexts (955,693 MinOpts, 1,442,089 FullOpts).

MISSED contexts: base: 174 (0.01%), diff: 5,300 (0.22%)

Overall (-1,050,688 bytes)
Collection Base size (bytes) Diff size (bytes) PerfScore in Diffs
benchmarks.run.windows.arm64.checked.mch 10,879,288 +5,028 -0.03%
benchmarks.run_pgo.windows.arm64.checked.mch 47,896,244 -51,784 -0.81%
benchmarks.run_tiered.windows.arm64.checked.mch 15,418,760 -4,136 -0.17%
coreclr_tests.run.windows.arm64.checked.mch 508,096,416 -1,451,404 -0.29%
libraries.crossgen2.windows.arm64.checked.mch 67,298,384 -203,856 +0.63%
libraries.pmi.windows.arm64.checked.mch 80,041,792 +49,080 -0.49%
libraries_tests.run.windows.arm64.Release.mch 327,777,900 +11,220 -0.25%
libraries_tests_no_tiered_compilation.run.windows.arm64.Release.mch 167,608,372 +606,696 +0.07%
realworld.run.windows.arm64.checked.mch 15,837,772 -1,844 -0.17%
smoke_tests.nativeaot.windows.arm64.checked.mch 3,980,864 -9,688 -0.15%
FullOpts (-1,050,688 bytes)
Collection Base size (bytes) Diff size (bytes) PerfScore in Diffs
benchmarks.run.windows.arm64.checked.mch 10,878,752 +5,028 -0.03%
benchmarks.run_pgo.windows.arm64.checked.mch 31,605,364 -51,784 -0.81%
benchmarks.run_tiered.windows.arm64.checked.mch 4,125,432 -4,136 -0.17%
coreclr_tests.run.windows.arm64.checked.mch 160,799,344 -1,451,404 -0.29%
libraries.crossgen2.windows.arm64.checked.mch 67,296,748 -203,856 +0.63%
libraries.pmi.windows.arm64.checked.mch 79,921,808 +49,080 -0.49%
libraries_tests.run.windows.arm64.Release.mch 122,685,228 +11,220 -0.25%
libraries_tests_no_tiered_compilation.run.windows.arm64.Release.mch 154,549,116 +606,696 +0.07%
realworld.run.windows.arm64.checked.mch 15,280,848 -1,844 -0.17%
smoke_tests.nativeaot.windows.arm64.checked.mch 3,979,852 -9,688 -0.15%

Assembly diffs for windows/x64 ran on windows/x64

Diffs are based on 2,429,920 contexts (941,815 MinOpts, 1,488,105 FullOpts).

MISSED contexts: base: 176 (0.01%), diff: 891 (0.04%)

Overall (+2,536,694 bytes)
Collection Base size (bytes) Diff size (bytes) PerfScore in Diffs
benchmarks.run.windows.x64.checked.mch 8,740,616 +9,967 -2.67%
benchmarks.run_pgo.windows.x64.checked.mch 35,540,192 +92,790 -0.88%
benchmarks.run_tiered.windows.x64.checked.mch 12,429,625 -5,403 -4.21%
coreclr_tests.run.windows.x64.checked.mch 404,049,115 +1,350,863 -1.77%
libraries.crossgen2.windows.x64.checked.mch 45,123,608 +10,664 -0.48%
libraries.pmi.windows.x64.checked.mch 62,129,258 +179,422 -1.45%
libraries_tests.run.windows.x64.Release.mch 282,300,135 +493,184 -1.37%
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch 136,037,588 +374,118 -1.94%
realworld.run.windows.x64.checked.mch 14,153,477 +24,813 -1.22%
smoke_tests.nativeaot.windows.x64.checked.mch 5,114,999 +6,276 -0.74%
FullOpts (+2,536,694 bytes)
Collection Base size (bytes) Diff size (bytes) PerfScore in Diffs
benchmarks.run.windows.x64.checked.mch 8,740,256 +9,967 -2.67%
benchmarks.run_pgo.windows.x64.checked.mch 21,479,875 +92,790 -0.88%
benchmarks.run_tiered.windows.x64.checked.mch 3,195,176 -5,403 -4.21%
coreclr_tests.run.windows.x64.checked.mch 123,481,509 +1,350,863 -1.77%
libraries.crossgen2.windows.x64.checked.mch 45,122,421 +10,664 -0.48%
libraries.pmi.windows.x64.checked.mch 62,015,764 +179,422 -1.45%
libraries_tests.run.windows.x64.Release.mch 105,925,985 +493,184 -1.37%
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch 125,744,709 +374,118 -1.94%
realworld.run.windows.x64.checked.mch 13,767,309 +24,813 -1.22%
smoke_tests.nativeaot.windows.x64.checked.mch 5,114,052 +6,276 -0.74%

Details here


Assembly diffs for windows/x86 ran on linux/x86

Diffs are based on 2,339,430 contexts (847,225 MinOpts, 1,492,205 FullOpts).

MISSED contexts: base: 1 (0.00%), diff: 8,833 (0.38%)

Overall (+1,228,297 bytes)
Collection Base size (bytes) Diff size (bytes) PerfScore in Diffs
benchmarks.run.windows.x86.checked.mch 6,969,637 +20,893 -1.24%
benchmarks.run_pgo.windows.x86.checked.mch 47,456,230 +36,835 -2.52%
benchmarks.run_tiered.windows.x86.checked.mch 9,351,120 +18,029 -1.12%
coreclr_tests.run.windows.x86.checked.mch 315,999,892 +454,403 -1.69%
libraries.crossgen2.windows.x86.checked.mch 35,570,616 +11,739 -0.68%
libraries.pmi.windows.x86.checked.mch 48,824,461 +165,393 -1.49%
libraries_tests.run.windows.x86.Release.mch 181,401,826 +408,071 -1.33%
libraries_tests_no_tiered_compilation.run.windows.x86.Release.mch 102,225,812 +81,602 -1.34%
realworld.run.windows.x86.checked.mch 11,028,236 +31,332 -1.07%
FullOpts (+1,228,297 bytes)
Collection Base size (bytes) Diff size (bytes) PerfScore in Diffs
benchmarks.run.windows.x86.checked.mch 6,969,356 +20,893 -1.24%
benchmarks.run_pgo.windows.x86.checked.mch 40,784,255 +36,835 -2.52%
benchmarks.run_tiered.windows.x86.checked.mch 5,043,353 +18,029 -1.12%
coreclr_tests.run.windows.x86.checked.mch 108,989,043 +454,403 -1.69%
libraries.crossgen2.windows.x86.checked.mch 35,569,556 +11,739 -0.68%
libraries.pmi.windows.x86.checked.mch 48,729,231 +165,393 -1.49%
libraries_tests.run.windows.x86.Release.mch 83,337,517 +408,071 -1.33%
libraries_tests_no_tiered_compilation.run.windows.x86.Release.mch 93,550,744 +81,602 -1.34%
realworld.run.windows.x86.checked.mch 10,732,966 +31,332 -1.07%

Details here


@ryujit-bot
Copy link

Diff results for #98776

Assembly diffs

Assembly diffs for linux/arm ran on windows/x86

Diffs are based on 2,262,678 contexts (832,863 MinOpts, 1,429,815 FullOpts).

MISSED contexts: base: 75,600 (3.23%), diff: 76,390 (3.26%)

Overall (-3,300,590 bytes)
Collection Base size (bytes) Diff size (bytes) PerfScore in Diffs
benchmarks.run.linux.arm.checked.mch 16,543,852 -199,676 -0.63%
benchmarks.run_pgo.linux.arm.checked.mch 68,542,524 -243,856 -0.11%
benchmarks.run_tiered.linux.arm.checked.mch 19,429,376 -138,392 -0.77%
coreclr_tests.run.linux.arm.checked.mch 321,435,778 -1,975,302 +0.16%
libraries.crossgen2.linux.arm.checked.mch 37,762,784 +3,738 +0.88%
libraries.pmi.linux.arm.checked.mch 49,767,906 -93,410 +0.05%
libraries_tests.run.linux.arm.Release.mch 237,219,854 -242,640 +0.06%
libraries_tests_no_tiered_compilation.run.linux.arm.Release.mch 93,098,472 -389,964 -0.08%
realworld.run.linux.arm.checked.mch 13,575,744 -21,088 +0.21%
FullOpts (-3,300,590 bytes)
Collection Base size (bytes) Diff size (bytes) PerfScore in Diffs
benchmarks.run.linux.arm.checked.mch 16,084,556 -199,676 -0.63%
benchmarks.run_pgo.linux.arm.checked.mch 56,557,950 -243,856 -0.11%
benchmarks.run_tiered.linux.arm.checked.mch 11,504,824 -138,392 -0.77%
coreclr_tests.run.linux.arm.checked.mch 108,984,412 -1,975,302 +0.16%
libraries.crossgen2.linux.arm.checked.mch 37,761,554 +3,738 +0.88%
libraries.pmi.linux.arm.checked.mch 49,661,682 -93,410 +0.05%
libraries_tests.run.linux.arm.Release.mch 115,169,888 -242,640 +0.06%
libraries_tests_no_tiered_compilation.run.linux.arm.Release.mch 83,068,678 -389,964 -0.08%
realworld.run.linux.arm.checked.mch 13,140,820 -21,088 +0.21%

Details here


Throughput diffs

Throughput diffs for linux/arm64 ran on windows/x64

Overall (+0.15% to +1.37%)
Collection PDIFF
benchmarks.run.linux.arm64.checked.mch +0.60%
benchmarks.run_pgo.linux.arm64.checked.mch +0.51%
benchmarks.run_tiered.linux.arm64.checked.mch +0.52%
coreclr_tests.run.linux.arm64.checked.mch +1.37%
libraries.crossgen2.linux.arm64.checked.mch +0.15%
libraries.pmi.linux.arm64.checked.mch +0.75%
libraries_tests.run.linux.arm64.Release.mch +0.73%
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch +0.34%
realworld.run.linux.arm64.checked.mch +0.75%
smoke_tests.nativeaot.linux.arm64.checked.mch +0.93%
FullOpts (+0.15% to +2.35%)
Collection PDIFF
benchmarks.run.linux.arm64.checked.mch +0.61%
benchmarks.run_pgo.linux.arm64.checked.mch +0.57%
benchmarks.run_tiered.linux.arm64.checked.mch +1.02%
coreclr_tests.run.linux.arm64.checked.mch +2.35%
libraries.crossgen2.linux.arm64.checked.mch +0.15%
libraries.pmi.linux.arm64.checked.mch +0.75%
libraries_tests.run.linux.arm64.Release.mch +0.97%
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch +0.35%
realworld.run.linux.arm64.checked.mch +0.75%
smoke_tests.nativeaot.linux.arm64.checked.mch +0.93%

Throughput diffs for linux/x64 ran on windows/x64

Overall (+0.48% to +1.58%)
Collection PDIFF
benchmarks.run.linux.x64.checked.mch +0.76%
benchmarks.run_pgo.linux.x64.checked.mch +0.48%
benchmarks.run_tiered.linux.x64.checked.mch +0.52%
coreclr_tests.run.linux.x64.checked.mch +1.58%
libraries.crossgen2.linux.x64.checked.mch +0.79%
libraries.pmi.linux.x64.checked.mch +1.04%
libraries_tests.run.linux.x64.Release.mch +0.90%
libraries_tests_no_tiered_compilation.run.linux.x64.Release.mch +0.69%
realworld.run.linux.x64.checked.mch +0.92%
smoke_tests.nativeaot.linux.x64.checked.mch +0.82%
FullOpts (+0.53% to +2.69%)
Collection PDIFF
benchmarks.run.linux.x64.checked.mch +0.76%
benchmarks.run_pgo.linux.x64.checked.mch +0.53%
benchmarks.run_tiered.linux.x64.checked.mch +1.03%
coreclr_tests.run.linux.x64.checked.mch +2.69%
libraries.crossgen2.linux.x64.checked.mch +0.79%
libraries.pmi.linux.x64.checked.mch +1.04%
libraries_tests.run.linux.x64.Release.mch +1.16%
libraries_tests_no_tiered_compilation.run.linux.x64.Release.mch +0.71%
realworld.run.linux.x64.checked.mch +0.93%
smoke_tests.nativeaot.linux.x64.checked.mch +0.82%

Throughput diffs for osx/arm64 ran on windows/x64

Overall (+0.15% to +1.38%)
Collection PDIFF
benchmarks.run.osx.arm64.checked.mch +0.76%
benchmarks.run_pgo.osx.arm64.checked.mch +0.76%
benchmarks.run_tiered.osx.arm64.checked.mch +0.66%
coreclr_tests.run.osx.arm64.checked.mch +1.38%
libraries.crossgen2.osx.arm64.checked.mch +0.15%
libraries.pmi.osx.arm64.checked.mch +0.72%
libraries_tests.run.osx.arm64.Release.mch +0.66%
libraries_tests_no_tiered_compilation.run.osx.arm64.Release.mch +0.35%
realworld.run.osx.arm64.checked.mch +0.78%
MinOpts (-0.00% to +0.01%)
Collection PDIFF
libraries.pmi.osx.arm64.checked.mch +0.01%
FullOpts (+0.15% to +2.40%)
Collection PDIFF
benchmarks.run.osx.arm64.checked.mch +0.76%
benchmarks.run_pgo.osx.arm64.checked.mch +0.94%
benchmarks.run_tiered.osx.arm64.checked.mch +1.14%
coreclr_tests.run.osx.arm64.checked.mch +2.40%
libraries.crossgen2.osx.arm64.checked.mch +0.15%
libraries.pmi.osx.arm64.checked.mch +0.72%
libraries_tests.run.osx.arm64.Release.mch +0.96%
libraries_tests_no_tiered_compilation.run.osx.arm64.Release.mch +0.35%
realworld.run.osx.arm64.checked.mch +0.78%

Throughput diffs for windows/arm64 ran on windows/x64

Overall (+0.18% to +1.37%)
Collection PDIFF
benchmarks.run.windows.arm64.checked.mch +0.76%
benchmarks.run_pgo.windows.arm64.checked.mch +0.53%
benchmarks.run_tiered.windows.arm64.checked.mch +0.67%
coreclr_tests.run.windows.arm64.checked.mch +1.37%
libraries.crossgen2.windows.arm64.checked.mch +0.18%
libraries.pmi.windows.arm64.checked.mch +0.75%
libraries_tests.run.windows.arm64.Release.mch +0.63%
libraries_tests_no_tiered_compilation.run.windows.arm64.Release.mch +0.31%
realworld.run.windows.arm64.checked.mch +0.57%
smoke_tests.nativeaot.windows.arm64.checked.mch +0.94%
MinOpts (-0.01% to +0.00%)
Collection PDIFF
libraries.pmi.windows.arm64.checked.mch -0.01%
FullOpts (+0.18% to +2.35%)
Collection PDIFF
benchmarks.run.windows.arm64.checked.mch +0.76%
benchmarks.run_pgo.windows.arm64.checked.mch +0.60%
benchmarks.run_tiered.windows.arm64.checked.mch +1.13%
coreclr_tests.run.windows.arm64.checked.mch +2.35%
libraries.crossgen2.windows.arm64.checked.mch +0.18%
libraries.pmi.windows.arm64.checked.mch +0.75%
libraries_tests.run.windows.arm64.Release.mch +0.89%
libraries_tests_no_tiered_compilation.run.windows.arm64.Release.mch +0.32%
realworld.run.windows.arm64.checked.mch +0.57%
smoke_tests.nativeaot.windows.arm64.checked.mch +0.94%

Throughput diffs for windows/x64 ran on windows/x64

Overall (+0.56% to +1.77%)
Collection PDIFF
benchmarks.run.windows.x64.checked.mch +0.85%
benchmarks.run_pgo.windows.x64.checked.mch +0.73%
benchmarks.run_tiered.windows.x64.checked.mch +0.62%
coreclr_tests.run.windows.x64.checked.mch +1.77%
libraries.crossgen2.windows.x64.checked.mch +0.73%
libraries.pmi.windows.x64.checked.mch +1.00%
libraries_tests.run.windows.x64.Release.mch +0.79%
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch +0.56%
realworld.run.windows.x64.checked.mch +0.69%
smoke_tests.nativeaot.windows.x64.checked.mch +0.71%
FullOpts (+0.57% to +2.99%)
Collection PDIFF
benchmarks.run.windows.x64.checked.mch +0.85%
benchmarks.run_pgo.windows.x64.checked.mch +0.84%
benchmarks.run_tiered.windows.x64.checked.mch +1.01%
coreclr_tests.run.windows.x64.checked.mch +2.99%
libraries.crossgen2.windows.x64.checked.mch +0.73%
libraries.pmi.windows.x64.checked.mch +1.00%
libraries_tests.run.windows.x64.Release.mch +1.08%
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch +0.57%
realworld.run.windows.x64.checked.mch +0.70%
smoke_tests.nativeaot.windows.x64.checked.mch +0.71%

Details here


@ryujit-bot
Copy link

Diff results for #98776

Assembly diffs

Assembly diffs for linux/arm64 ran on windows/x64

Diffs are based on 2,549,519 contexts (1,019,526 MinOpts, 1,529,993 FullOpts).

MISSED contexts: base: 172 (0.01%), diff: 5,238 (0.21%)

Overall (-1,000,028 bytes)
Collection Base size (bytes) Diff size (bytes) PerfScore in Diffs
benchmarks.run.linux.arm64.checked.mch 18,177,936 +73,684 +0.47%
benchmarks.run_pgo.linux.arm64.checked.mch 76,659,216 -68,772 -1.81%
benchmarks.run_tiered.linux.arm64.checked.mch 22,368,864 -3,280 -0.15%
coreclr_tests.run.linux.arm64.checked.mch 521,932,512 -1,452,504 -0.32%
libraries.crossgen2.linux.arm64.checked.mch 64,066,684 -189,792 +0.63%
libraries.pmi.linux.arm64.checked.mch 76,561,528 +46,740 -0.50%
libraries_tests.run.linux.arm64.Release.mch 379,466,540 +65,236 -0.25%
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch 161,433,920 +527,392 +0.01%
realworld.run.linux.arm64.checked.mch 15,705,696 +7,800 -0.16%
smoke_tests.nativeaot.linux.arm64.checked.mch 2,971,464 -6,532 -0.04%
FullOpts (-1,000,028 bytes)
Collection Base size (bytes) Diff size (bytes) PerfScore in Diffs
benchmarks.run.linux.arm64.checked.mch 17,684,900 +73,684 +0.47%
benchmarks.run_pgo.linux.arm64.checked.mch 53,524,948 -68,772 -1.81%
benchmarks.run_tiered.linux.arm64.checked.mch 4,728,640 -3,280 -0.15%
coreclr_tests.run.linux.arm64.checked.mch 164,343,324 -1,452,504 -0.32%
libraries.crossgen2.linux.arm64.checked.mch 64,065,048 -189,792 +0.63%
libraries.pmi.linux.arm64.checked.mch 76,441,544 +46,740 -0.50%
libraries_tests.run.linux.arm64.Release.mch 163,929,444 +65,236 -0.25%
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch 148,030,812 +527,392 +0.01%
realworld.run.linux.arm64.checked.mch 15,141,388 +7,800 -0.16%
smoke_tests.nativeaot.linux.arm64.checked.mch 2,970,476 -6,532 -0.04%

Assembly diffs for linux/x64 ran on windows/x64

Diffs are based on 2,542,303 contexts (988,245 MinOpts, 1,554,058 FullOpts).

MISSED contexts: base: 177 (0.01%), diff: 1,098 (0.04%)

Overall (+2,009,297 bytes)
Collection Base size (bytes) Diff size (bytes) PerfScore in Diffs
benchmarks.run.linux.x64.checked.mch 12,075,262 +5,093 -2.47%
benchmarks.run_pgo.linux.x64.checked.mch 67,899,843 +15,032 -1.75%
benchmarks.run_tiered.linux.x64.checked.mch 20,398,221 -14,279 -3.42%
coreclr_tests.run.linux.x64.checked.mch 414,872,683 +518,690 -1.74%
libraries.crossgen2.linux.x64.checked.mch 44,737,894 +45,288 -0.49%
libraries.pmi.linux.x64.checked.mch 60,901,638 +160,898 -1.49%
libraries_tests.run.linux.x64.Release.mch 328,378,411 +946,848 -1.16%
libraries_tests_no_tiered_compilation.run.linux.x64.Release.mch 132,100,449 +286,308 -1.92%
realworld.run.linux.x64.checked.mch 13,158,743 +33,481 -1.10%
smoke_tests.nativeaot.linux.x64.checked.mch 4,240,043 +11,938 -1.03%
FullOpts (+2,009,297 bytes)
Collection Base size (bytes) Diff size (bytes) PerfScore in Diffs
benchmarks.run.linux.x64.checked.mch 11,901,522 +5,093 -2.47%
benchmarks.run_pgo.linux.x64.checked.mch 48,116,284 +15,032 -1.75%
benchmarks.run_tiered.linux.x64.checked.mch 3,685,943 -14,279 -3.42%
coreclr_tests.run.linux.x64.checked.mch 126,925,785 +518,690 -1.74%
libraries.crossgen2.linux.x64.checked.mch 44,736,696 +45,288 -0.49%
libraries.pmi.linux.x64.checked.mch 60,788,808 +160,898 -1.49%
libraries_tests.run.linux.x64.Release.mch 145,576,425 +946,848 -1.16%
libraries_tests_no_tiered_compilation.run.linux.x64.Release.mch 121,528,353 +286,308 -1.92%
realworld.run.linux.x64.checked.mch 12,773,258 +33,481 -1.10%
smoke_tests.nativeaot.linux.x64.checked.mch 4,239,094 +11,938 -1.03%

Assembly diffs for osx/arm64 ran on windows/x64

Diffs are based on 2,312,793 contexts (945,402 MinOpts, 1,367,391 FullOpts).

MISSED contexts: base: 170 (0.01%), diff: 4,920 (0.21%)

Overall (-984,144 bytes)
Collection Base size (bytes) Diff size (bytes) PerfScore in Diffs
benchmarks.run.osx.arm64.checked.mch 11,082,424 +5,184 -0.04%
benchmarks.run_pgo.osx.arm64.checked.mch 34,604,740 +8,120 -0.24%
benchmarks.run_tiered.osx.arm64.checked.mch 15,625,516 -4,276 -0.11%
coreclr_tests.run.osx.arm64.checked.mch 506,293,844 -1,298,788 -0.34%
libraries.crossgen2.osx.arm64.checked.mch 63,948,516 -192,940 +0.61%
libraries.pmi.osx.arm64.checked.mch 80,552,460 +60,928 -0.41%
libraries_tests.run.osx.arm64.Release.mch 311,987,272 -2,908 -0.31%
libraries_tests_no_tiered_compilation.run.osx.arm64.Release.mch 159,439,412 +437,936 -0.01%
realworld.run.osx.arm64.checked.mch 15,016,556 +2,600 -0.23%
FullOpts (-984,144 bytes)
Collection Base size (bytes) Diff size (bytes) PerfScore in Diffs
benchmarks.run.osx.arm64.checked.mch 11,081,888 +5,184 -0.04%
benchmarks.run_pgo.osx.arm64.checked.mch 18,111,384 +8,120 -0.24%
benchmarks.run_tiered.osx.arm64.checked.mch 3,993,192 -4,276 -0.11%
coreclr_tests.run.osx.arm64.checked.mch 154,849,448 -1,298,788 -0.34%
libraries.crossgen2.osx.arm64.checked.mch 63,946,888 -192,940 +0.61%
libraries.pmi.osx.arm64.checked.mch 80,431,396 +60,928 -0.41%
libraries_tests.run.osx.arm64.Release.mch 110,585,536 -2,908 -0.31%
libraries_tests_no_tiered_compilation.run.osx.arm64.Release.mch 146,391,192 +437,936 -0.01%
realworld.run.osx.arm64.checked.mch 14,459,656 +2,600 -0.23%

Assembly diffs for windows/arm64 ran on windows/x64

Diffs are based on 2,397,782 contexts (955,693 MinOpts, 1,442,089 FullOpts).

MISSED contexts: base: 174 (0.01%), diff: 5,300 (0.22%)

Overall (-1,050,688 bytes)
Collection Base size (bytes) Diff size (bytes) PerfScore in Diffs
benchmarks.run.windows.arm64.checked.mch 10,879,288 +5,028 -0.03%
benchmarks.run_pgo.windows.arm64.checked.mch 47,896,244 -51,784 -0.81%
benchmarks.run_tiered.windows.arm64.checked.mch 15,418,760 -4,136 -0.17%
coreclr_tests.run.windows.arm64.checked.mch 508,096,416 -1,451,404 -0.29%
libraries.crossgen2.windows.arm64.checked.mch 67,298,384 -203,856 +0.63%
libraries.pmi.windows.arm64.checked.mch 80,041,792 +49,080 -0.49%
libraries_tests.run.windows.arm64.Release.mch 327,777,900 +11,220 -0.25%
libraries_tests_no_tiered_compilation.run.windows.arm64.Release.mch 167,608,372 +606,696 +0.07%
realworld.run.windows.arm64.checked.mch 15,837,772 -1,844 -0.17%
smoke_tests.nativeaot.windows.arm64.checked.mch 3,980,864 -9,688 -0.15%
FullOpts (-1,050,688 bytes)
Collection Base size (bytes) Diff size (bytes) PerfScore in Diffs
benchmarks.run.windows.arm64.checked.mch 10,878,752 +5,028 -0.03%
benchmarks.run_pgo.windows.arm64.checked.mch 31,605,364 -51,784 -0.81%
benchmarks.run_tiered.windows.arm64.checked.mch 4,125,432 -4,136 -0.17%
coreclr_tests.run.windows.arm64.checked.mch 160,799,344 -1,451,404 -0.29%
libraries.crossgen2.windows.arm64.checked.mch 67,296,748 -203,856 +0.63%
libraries.pmi.windows.arm64.checked.mch 79,921,808 +49,080 -0.49%
libraries_tests.run.windows.arm64.Release.mch 122,685,228 +11,220 -0.25%
libraries_tests_no_tiered_compilation.run.windows.arm64.Release.mch 154,549,116 +606,696 +0.07%
realworld.run.windows.arm64.checked.mch 15,280,848 -1,844 -0.17%
smoke_tests.nativeaot.windows.arm64.checked.mch 3,979,852 -9,688 -0.15%

Assembly diffs for windows/x64 ran on windows/x64

Diffs are based on 2,429,920 contexts (941,815 MinOpts, 1,488,105 FullOpts).

MISSED contexts: base: 176 (0.01%), diff: 891 (0.04%)

Overall (+2,536,694 bytes)
Collection Base size (bytes) Diff size (bytes) PerfScore in Diffs
benchmarks.run.windows.x64.checked.mch 8,740,616 +9,967 -2.67%
benchmarks.run_pgo.windows.x64.checked.mch 35,540,192 +92,790 -0.88%
benchmarks.run_tiered.windows.x64.checked.mch 12,429,625 -5,403 -4.21%
coreclr_tests.run.windows.x64.checked.mch 404,049,115 +1,350,863 -1.77%
libraries.crossgen2.windows.x64.checked.mch 45,123,608 +10,664 -0.48%
libraries.pmi.windows.x64.checked.mch 62,129,258 +179,422 -1.45%
libraries_tests.run.windows.x64.Release.mch 282,300,135 +493,184 -1.37%
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch 136,037,588 +374,118 -1.94%
realworld.run.windows.x64.checked.mch 14,153,477 +24,813 -1.22%
smoke_tests.nativeaot.windows.x64.checked.mch 5,114,999 +6,276 -0.74%
FullOpts (+2,536,694 bytes)
Collection Base size (bytes) Diff size (bytes) PerfScore in Diffs
benchmarks.run.windows.x64.checked.mch 8,740,256 +9,967 -2.67%
benchmarks.run_pgo.windows.x64.checked.mch 21,479,875 +92,790 -0.88%
benchmarks.run_tiered.windows.x64.checked.mch 3,195,176 -5,403 -4.21%
coreclr_tests.run.windows.x64.checked.mch 123,481,509 +1,350,863 -1.77%
libraries.crossgen2.windows.x64.checked.mch 45,122,421 +10,664 -0.48%
libraries.pmi.windows.x64.checked.mch 62,015,764 +179,422 -1.45%
libraries_tests.run.windows.x64.Release.mch 105,925,985 +493,184 -1.37%
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch 125,744,709 +374,118 -1.94%
realworld.run.windows.x64.checked.mch 13,767,309 +24,813 -1.22%
smoke_tests.nativeaot.windows.x64.checked.mch 5,114,052 +6,276 -0.74%

Details here


Assembly diffs for linux/arm ran on windows/x86

Diffs are based on 2,262,678 contexts (832,863 MinOpts, 1,429,815 FullOpts).

MISSED contexts: base: 75,600 (3.23%), diff: 76,390 (3.26%)

Overall (-3,300,590 bytes)
Collection Base size (bytes) Diff size (bytes) PerfScore in Diffs
benchmarks.run.linux.arm.checked.mch 16,543,852 -199,676 -0.63%
benchmarks.run_pgo.linux.arm.checked.mch 68,542,524 -243,856 -0.11%
benchmarks.run_tiered.linux.arm.checked.mch 19,429,376 -138,392 -0.77%
coreclr_tests.run.linux.arm.checked.mch 321,435,778 -1,975,302 +0.16%
libraries.crossgen2.linux.arm.checked.mch 37,762,784 +3,738 +0.88%
libraries.pmi.linux.arm.checked.mch 49,767,906 -93,410 +0.05%
libraries_tests.run.linux.arm.Release.mch 237,219,854 -242,640 +0.06%
libraries_tests_no_tiered_compilation.run.linux.arm.Release.mch 93,098,472 -389,964 -0.08%
realworld.run.linux.arm.checked.mch 13,575,744 -21,088 +0.21%
FullOpts (-3,300,590 bytes)
Collection Base size (bytes) Diff size (bytes) PerfScore in Diffs
benchmarks.run.linux.arm.checked.mch 16,084,556 -199,676 -0.63%
benchmarks.run_pgo.linux.arm.checked.mch 56,557,950 -243,856 -0.11%
benchmarks.run_tiered.linux.arm.checked.mch 11,504,824 -138,392 -0.77%
coreclr_tests.run.linux.arm.checked.mch 108,984,412 -1,975,302 +0.16%
libraries.crossgen2.linux.arm.checked.mch 37,761,554 +3,738 +0.88%
libraries.pmi.linux.arm.checked.mch 49,661,682 -93,410 +0.05%
libraries_tests.run.linux.arm.Release.mch 115,169,888 -242,640 +0.06%
libraries_tests_no_tiered_compilation.run.linux.arm.Release.mch 83,068,678 -389,964 -0.08%
realworld.run.linux.arm.checked.mch 13,140,820 -21,088 +0.21%

Assembly diffs for windows/x86 ran on windows/x86

Diffs are based on 2,339,430 contexts (847,225 MinOpts, 1,492,205 FullOpts).

MISSED contexts: base: 1 (0.00%), diff: 8,833 (0.38%)

Overall (+1,228,297 bytes)
Collection Base size (bytes) Diff size (bytes) PerfScore in Diffs
benchmarks.run.windows.x86.checked.mch 6,969,637 +20,893 -1.24%
benchmarks.run_pgo.windows.x86.checked.mch 47,456,230 +36,835 -2.52%
benchmarks.run_tiered.windows.x86.checked.mch 9,351,120 +18,029 -1.12%
coreclr_tests.run.windows.x86.checked.mch 315,999,892 +454,403 -1.69%
libraries.crossgen2.windows.x86.checked.mch 35,570,616 +11,739 -0.68%
libraries.pmi.windows.x86.checked.mch 48,824,461 +165,393 -1.49%
libraries_tests.run.windows.x86.Release.mch 181,401,826 +408,071 -1.33%
libraries_tests_no_tiered_compilation.run.windows.x86.Release.mch 102,225,812 +81,602 -1.34%
realworld.run.windows.x86.checked.mch 11,028,236 +31,332 -1.07%
FullOpts (+1,228,297 bytes)
Collection Base size (bytes) Diff size (bytes) PerfScore in Diffs
benchmarks.run.windows.x86.checked.mch 6,969,356 +20,893 -1.24%
benchmarks.run_pgo.windows.x86.checked.mch 40,784,255 +36,835 -2.52%
benchmarks.run_tiered.windows.x86.checked.mch 5,043,353 +18,029 -1.12%
coreclr_tests.run.windows.x86.checked.mch 108,989,043 +454,403 -1.69%
libraries.crossgen2.windows.x86.checked.mch 35,569,556 +11,739 -0.68%
libraries.pmi.windows.x86.checked.mch 48,729,231 +165,393 -1.49%
libraries_tests.run.windows.x86.Release.mch 83,337,517 +408,071 -1.33%
libraries_tests_no_tiered_compilation.run.windows.x86.Release.mch 93,550,744 +81,602 -1.34%
realworld.run.windows.x86.checked.mch 10,732,966 +31,332 -1.07%

Details here


Throughput diffs

Throughput diffs for linux/arm ran on windows/x86

Overall (+0.30% to +2.08%)
Collection PDIFF
benchmarks.run.linux.arm.checked.mch +0.61%
benchmarks.run_pgo.linux.arm.checked.mch +0.30%
benchmarks.run_tiered.linux.arm.checked.mch +0.52%
coreclr_tests.run.linux.arm.checked.mch +2.08%
libraries.crossgen2.linux.arm.checked.mch +0.46%
libraries.pmi.linux.arm.checked.mch +0.95%
libraries_tests.run.linux.arm.Release.mch +0.81%
libraries_tests_no_tiered_compilation.run.linux.arm.Release.mch +0.99%
realworld.run.linux.arm.checked.mch +0.66%
FullOpts (+0.31% to +3.40%)
Collection PDIFF
benchmarks.run.linux.arm.checked.mch +0.61%
benchmarks.run_pgo.linux.arm.checked.mch +0.31%
benchmarks.run_tiered.linux.arm.checked.mch +0.63%
coreclr_tests.run.linux.arm.checked.mch +3.40%
libraries.crossgen2.linux.arm.checked.mch +0.46%
libraries.pmi.linux.arm.checked.mch +0.95%
libraries_tests.run.linux.arm.Release.mch +1.04%
libraries_tests_no_tiered_compilation.run.linux.arm.Release.mch +1.02%
realworld.run.linux.arm.checked.mch +0.67%

Throughput diffs for windows/x86 ran on windows/x86

Overall (+0.48% to +1.66%)
Collection PDIFF
benchmarks.run.windows.x86.checked.mch +0.81%
benchmarks.run_pgo.windows.x86.checked.mch +0.48%
benchmarks.run_tiered.windows.x86.checked.mch +0.74%
coreclr_tests.run.windows.x86.checked.mch +1.66%
libraries.crossgen2.windows.x86.checked.mch +0.78%
libraries.pmi.windows.x86.checked.mch +0.90%
libraries_tests.run.windows.x86.Release.mch +0.66%
libraries_tests_no_tiered_compilation.run.windows.x86.Release.mch +0.56%
realworld.run.windows.x86.checked.mch +0.72%
FullOpts (+0.50% to +2.50%)
Collection PDIFF
benchmarks.run.windows.x86.checked.mch +0.81%
benchmarks.run_pgo.windows.x86.checked.mch +0.50%
benchmarks.run_tiered.windows.x86.checked.mch +0.88%
coreclr_tests.run.windows.x86.checked.mch +2.50%
libraries.crossgen2.windows.x86.checked.mch +0.78%
libraries.pmi.windows.x86.checked.mch +0.90%
libraries_tests.run.windows.x86.Release.mch +0.83%
libraries_tests_no_tiered_compilation.run.windows.x86.Release.mch +0.57%
realworld.run.windows.x86.checked.mch +0.73%

Details here


Throughput diffs for linux/arm64 ran on linux/x64

Overall (+0.09% to +1.28%)
Collection PDIFF
benchmarks.run_pgo.linux.arm64.checked.mch +0.47%
smoke_tests.nativeaot.linux.arm64.checked.mch +0.90%
benchmarks.run.linux.arm64.checked.mch +0.56%
coreclr_tests.run.linux.arm64.checked.mch +1.28%
realworld.run.linux.arm64.checked.mch +0.71%
libraries.pmi.linux.arm64.checked.mch +0.71%
libraries.crossgen2.linux.arm64.checked.mch +0.09%
libraries_tests.run.linux.arm64.Release.mch +0.68%
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch +0.29%
benchmarks.run_tiered.linux.arm64.checked.mch +0.50%
FullOpts (+0.09% to +2.26%)
Collection PDIFF
benchmarks.run_pgo.linux.arm64.checked.mch +0.53%
smoke_tests.nativeaot.linux.arm64.checked.mch +0.90%
benchmarks.run.linux.arm64.checked.mch +0.57%
coreclr_tests.run.linux.arm64.checked.mch +2.26%
realworld.run.linux.arm64.checked.mch +0.71%
libraries.pmi.linux.arm64.checked.mch +0.71%
libraries.crossgen2.linux.arm64.checked.mch +0.09%
libraries_tests.run.linux.arm64.Release.mch +0.91%
libraries_tests_no_tiered_compilation.run.linux.arm64.Release.mch +0.30%
benchmarks.run_tiered.linux.arm64.checked.mch +0.97%

Throughput diffs for linux/x64 ran on linux/x64

Overall (+0.45% to +1.48%)
Collection PDIFF
coreclr_tests.run.linux.x64.checked.mch +1.48%
libraries_tests.run.linux.x64.Release.mch +0.87%
benchmarks.run.linux.x64.checked.mch +0.72%
libraries_tests_no_tiered_compilation.run.linux.x64.Release.mch +0.66%
libraries.pmi.linux.x64.checked.mch +1.02%
libraries.crossgen2.linux.x64.checked.mch +0.77%
smoke_tests.nativeaot.linux.x64.checked.mch +0.79%
benchmarks.run_pgo.linux.x64.checked.mch +0.45%
benchmarks.run_tiered.linux.x64.checked.mch +0.50%
realworld.run.linux.x64.checked.mch +0.90%
FullOpts (+0.50% to +2.62%)
Collection PDIFF
coreclr_tests.run.linux.x64.checked.mch +2.62%
libraries_tests.run.linux.x64.Release.mch +1.12%
benchmarks.run.linux.x64.checked.mch +0.73%
libraries_tests_no_tiered_compilation.run.linux.x64.Release.mch +0.68%
libraries.pmi.linux.x64.checked.mch +1.02%
libraries.crossgen2.linux.x64.checked.mch +0.77%
smoke_tests.nativeaot.linux.x64.checked.mch +0.79%
benchmarks.run_pgo.linux.x64.checked.mch +0.50%
benchmarks.run_tiered.linux.x64.checked.mch +0.99%
realworld.run.linux.x64.checked.mch +0.90%

Details here


@AndyAyersMS
Copy link
Member Author

Overall Impression

It appears it has done a decent job of reducing perf scores, especially on Win x64, which was the only os/arch I used for training.

image

Perf Score

My training runs projected around a 0.4% improvement in perf scores, but this was just for methods with CSEs, so it is a bit hard to project it to aggregate diffs across entire collections or just methods w/diffs, since method with CSEs but no diffs and methods w/o CSEs will confound things. I will fix my local metric to at least collect the w/o diff data in the future.

Despite that, looking at the detailed CI data the perf score aggregate across all methods shows improvements in all win x64 collections (sadly asp.net seems to be out of data again, will have to fix that). The policy was trained on a 100 method sample from an older asp.net MCH so few or perhaps none of the methods in the diffs above were used as part of training. So there does not seem to be evidence of overfitting; the learned policy seems to handle methods it has never seen passably well.

We don't have enough experience with perf score diffs to judge if these results are significant. Perhaps these sorts of diffs are easily obtainable.

Code Size

Code size impact on x64 is not great. I have looked at this some and the current algorithm is biased against the 10-byte constant class handle CSEs for some reason (that is the CostSz feature (parameter 4 on your program) has "downvote" weight of -0.2363 and these simple CSEs often do not have enough other positive attributes (CostEx, local uses, etc) to overcome that. Will post more details below.

Oddly there are some good code size reductions on arm64. No idea why.

One thought is that perhaps we should simply train the policy on code size and not perf score, as it is also likely well correlated with perf, and less sensitive to profile weight shenanigans. Doing this is conceptually simple but I need to add metric tracking for code size into the driver program.

Throughput

Also not great. This is a little surprising as the CSE algorithm should not be that costly. Will have to dig in deeper there. At first blush it might just be the additional code size, but we have some surprising code size reductions on arm64 and throughput is no better off there.

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Feb 23, 2024
@ghost
Copy link

ghost commented Feb 23, 2024

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

Am going to use this as a place to capture observations on the evolved heuristic.

I have done a bit of analysis locally already and will add those notes here. One of the key challenges will be figuring out how to try and fix the various problems without losing the benefits.

Author: AndyAyersMS
Assignees: AndyAyersMS
Labels:

area-CodeGen-coreclr

Milestone: -

@AndyAyersMS
Copy link
Member Author

benchmarks.run 7786

Here's one size regression analysis. I need to find ways to do these faster. @jakobbotsch you mentioned something about automated regression analysis?

BASE Total bytes of code 1435, prolog size 57, PerfScore 596.08, seq 5,3,12,0
DIFF Total bytes of code 1811, prolog size 66, PerfScore 634,33, seq 7,16,10,14,17,3,2,15,4,13,11,12,1,18,5,0

So not only a code size regression, but also a perf score regression. Clearly we do a lot more CSEs now... assuming order doesn't matter, this is

BASE seq     3,  5,        12,                  0
DIFF seq 1,2,3,4,5,7,10,11,12,13,14,15,16,17,18,0

So it is a superset. Using MCMC to gauge the space of opportunities here we see we can do better than BASE

INDEX   N      BEST       BASE      WORST      NOCSE     RATIO    RANK 
  7786 18    574.58     596.08     671.83     603.58     1.037   46/512 r best [17,2,10,12,15,18,13,16,3,14,11,4,7]/1 base [5,3,12]

and sorting

BASE seq     3,  5,        12,                  0   596.08
DIFF seq 1,2,3,4,5,7,10,11,12,13,14,15,16,17,18,0   634.33
BEST seq   2,3,4,  7,10,11,12,13,14,15,16,17,18,0   574.58
SBST seq   2,3,4,       11,12,13,14,15,   17        574.58

where SBST is the shortest MCMC sequence that gets the best score. So it appears candidates 1 and 5 are the troublemakers.

In the initial ranking these CSEs rank pretty low, but rank above QUIT (note the table below is listed in no particular order; we are ranking by preference)

Greedy candidate evaluation
   0: QUIT    preference -2.1955020 likelihood  0.0000000
   1: CSE #18 preference  1.1131287 likelihood  0.0000000
   2: CSE #17 preference  7.4340287 likelihood  0.0000000
   3: CSE #16 preference  7.5343287 likelihood  0.0000000
   4: CSE #15 preference  6.0426787 likelihood  0.0000000
   5: CSE #14 preference  7.4340287 likelihood  0.0000000
   6: CSE #13 preference  5.8063787 likelihood  0.0000000
   7: CSE #12 preference  4.5298649 likelihood  0.0000000
   8: CSE #11 preference  5.7200624 likelihood  0.0000000
   9: CSE #10 preference  7.4340287 likelihood  0.0000000
=>10: CSE #07 preference  7.5343287 likelihood  0.0000000
  11: CSE #06 preference -4.2687148 likelihood  0.0000000
  12: CSE #05 preference  1.4416852 likelihood  0.0000000
  13: CSE #04 preference  5.8063787 likelihood  0.0000000
  14: CSE #03 preference  6.8237956 likelihood  0.0000000
  15: CSE #02 preference  6.1154352 likelihood  0.0000000
  16: CSE #01 preference  2.5778905 likelihood  0.0000000

and by the time we've gone through and CSE'd all the higher-ranked candidates things haven't changed much, other than the QUIT preference being even lower:

Greedy candidate evaluation
   0: QUIT    preference -2.3224229 likelihood  0.0000000
   1: CSE #18 preference  1.1131287 likelihood  0.0000000
   2: CSE #06 preference -4.2687148 likelihood  0.0000000
   3: CSE #05 preference  1.4416852 likelihood  0.0000000
=> 4: CSE #01 preference  2.5778905 likelihood  0.0000000

What are these CSEs?

CSE #05, {$503, $2c3} useCnt=2: [def=400.000000, use=400.000000, cost=  4, call]
CSE #01, {$18c, $2c3} useCnt=4: [def=100.000000, use=800.000000, cost=  3, call]

N008 (  4,  5) CSE #05 (use)[000507] ---X-----U-                         *  CAST      long <- ulong <- uint $504
N007 (  3,  3) CSE #01 (use)[000506] ---X-------                         \--*  ARR_LENGTH int    $481
N006 (  1,  1)              [000505] -----------                            \--*  LCL_VAR   ref    V01 arg1         u:1 $101

So low cost, "containable" (though not marked as such), 2 defs and 2/4 uses, live across call. What goes wrong? Doing these CSE causes V16 DIFF T01 to be spilled

BASE ;  V01 arg1         [V01,T01] ( 15, 27.50)     ref  ->  rbx         class-hnd single-def <System.String>
BASE ;  V16 tmp6         [V16,T03] (  7, 28   )     ref  ->  rdi         class-hnd "impAppendStmt" <<unknown class>>

DIFF ;  V01 arg1         [V01,T02] ( 13, 23.50)     ref  ->  rbx         class-hnd single-def <System.String>
DIFF ;  V16 tmp6         [V16,T01] (  7, 28   )     ref  ->  [rbp+0x50]  class-hnd spill-single-def "impAppendStmt" <<unknown class>>

Some thoughts on what might be mis-modelled here:

  • stopping preference should reflect pressure, pressure should be increasing, but it decreases as the pressure param is -0.23.
    Perhaps we need to study what data available at CSE time can better predict pressure? Do we have a spill cost metric?
  • the local pressure estimate figured things were at the point where T03 might be spilled (which it was, as V16 was T03 at the time of CSE)
  • if we CSE a tree with a local and only replace a small fraction of it uses, that seems possibly interesting.
  • doubly so if the cse "distance" metric is large: we're likely creating a conflict between the CSE temp and the local.
    Here distance is 0.74 (distance is fraction of blocks appearing in an RPO between the earliest and latest appearance of the CSE).
    So maybe track this and an interaction term of distance * fraction?
  • mark these opcodes as "containable"?

CSE 01's def is to a local V04, which is a popular single-def local, so that might indicate a manual CSE. Perhaps in such case we're better off deferring to copy prop? CSE 01 is not properly nested with CSE 05, so there's probably an order dependence here (doing 05 first will reduce the viability of 01). We don't check for nesting in our modelling today.

***** BB03 [0017]
STMT00010 ( 0x040[E-] ... 0x046 )
N003 (  3,  3)              [000032] DA-X-------                         *  STORE_LCL_VAR int    V04 loc2         d:2 $2c4
N002 (  3,  3) CSE #01 (def)[000031] ---X-------                         \--*  ARR_LENGTH int    $481
N001 (  1,  1)              [000030] -----------                            \--*  LCL_VAR   ref    V01 arg1         u:1 $101

after CSE

CSE #01 def at [000031] replaced in BB03 with def of V204
optValnumCSE morphed tree:
N006 (  4,  4)              [000032] DA-X-------                         *  STORE_LCL_VAR int    V04 loc2         d:2 $2c4
N005 (  4,  4)              [001955] -A-X-------                         \--*  COMMA     int    $481
N003 (  3,  3) CSE #01 (def)[001953] DA-X-------                            +--*  STORE_LCL_VAR int    V204 cse0        d:1 $VN.Void
N002 (  3,  3)              [000031] ---X-------                            |  \--*  ARR_LENGTH int    $481
N001 (  1,  1)              [000030] -----------                            |     \--*  LCL_VAR   ref    V01 arg1         u:1 $101
N004 (  1,  1)              [001954] -----------                            \--*  LCL_VAR   int    V204 cse0        u:1 $481

@AndyAyersMS
Copy link
Member Author

AndyAyersMS commented Feb 23, 2024

benchmarks.run 18102

One cse, base does, diff doesn't

Parameterized CSE Heuristic parameters 0.242500,0.247900,0.108900,-0.236300,0.247200,-0.055900,-0.841800,-0.058500,-0.277300,0.000000,0.021300,-0.411600,0.000000,-0.092200,0.259300,-0.031500,-0.074500,0.260700,0.347500,-0.059000,-0.317700,-0.688300,-0.499800,-0.322000,-0.226800
Local weight table...
RL using greedy policy
features,18102,CSE #01,  3.0000, 11.5129, 11.5129, 10.0000,  2.0000,  1.0000,  5.0000,  5.0000,  5.0000,  0.0000,  0.0000,  5.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000, 12.2061,  0.0000,  2.5000,  0.0000,  0.0000,  5.0000,  0.0000
wtfeat,18102,CSE #00,  0.7275,  2.8541,  1.2538, -2.3630,  0.4944, -0.0559, -4.2090, -0.2925, -1.3865,  0.0000,  0.0000, -2.0580,  0.0000, -0.0000,  0.0000, -0.0000, -0.0000,  0.0000,  4.2416, -0.0000, -0.7942, -0.0000, -0.0000, -1.6100, -0.0000
Pressure count 15, pressure weight 0.001
features,18102,CSE #00,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000,  0.0000
wtfeat,18102,CSE #00,  0.0000,  0.0000,  0.0000, -0.0000,  0.0000, -0.0000, -0.0000, -0.0000, -0.0000,  0.0000,  0.0000, -0.0000,  0.0000, -0.0000,  0.0000, -0.0000, -0.0000,  0.0000,  0.0000, -0.0000, -0.0000, -0.0000, -0.0000, -0.0000, -0.0000
Greedy candidate evaluation
=> 0: QUIT    preference  0.0000000 likelihood  0.0000000
   1: CSE #01 preference -3.1978279 likelihood  0.0000000

Here we have "downvotes" from CostSz (-2.3) LsraLA (-1.6), LA (-4.2), Const(-1.3), Const+LA (-2.058)

But no pressure whatsoever, perf score is better... just the large constants that don't get CSEd.

Opt Goal was "Small code" as this is a cctor -- we don't have a heuristic for this (yet). Seems like we ought to make one, similar to what we're doing for PerfScore.


On that note I have started adding support for code size as an optimization objective. In the initial cut I've just added it to MCMC, to get a rough feeling for how often optimizing for score and optimizing speed coincide or are at odds with one another. Here is some sample data (200 randomly chosen methods).

  ---baseline heuristic had optimal perf score in 125 of 200 methods 62.50%; 
     best/base 0.984 base/none 0.940 best/none 0.925 best(size)/base 1.018 (8358 runs in 39836ms)
  ---baseline heuristic had optimal code size  in  91 of 200 methods 45.50%; 
     best/base 0.981 base/none 0.964 best/none 0.945 best(score)/base 1.000

Parsing this ... with a score-optimal CSE policy we can improve perf scores by about 1.6% (but increase code size by 1.8%); with a size-optimal policy we can reduce code size by about 1.9% and keep perf scores about the same as they are now.

Seems to suggest that if we want to improve perf scores via CSE over our current heuristic, we're going to have to accept some size increase.

Caveat: I don't yet surface the optimization objective so comparisons vs baseline are a bit tricky, as it changes its heuristic.

This is using the average cross-result, however... that is to compute the size impact for best perf score, I find all experiments with the best perf score, and compute the average size (and likewise for score). With sufficiently clever training perhaps we could do better than these averages? I will add yet another metric to find the "best/best" and see if materially differs from "best/average".

I don't have the ability to optimize a mixed objective (yet?) so can't say what the actual tradeoff curve of size and score might look like or whether something like best/best is achievable (given that we're still working on finding a policy that can get the first "best").

Follow-up experiments:

  • look at the best/best data
  • see if there's some way to more fully describe the limiting shape of the score/size tradeoff curve. Seems like we have the data in MCMC to do this.
  • surface the optimization objective in the method metrics and filter the above so we're only looking at baseline data that matches our objectives (that is, don't use SIZE_OPT baselines when looking at score improvements, and vice versa). I don't think there are that many optimize-for-size cases (mainly .cctors) so this may not have much impact on the results above.
  • build a size-optimizing policy and adjust param sets in the greedy heuristic based on objective?

Using a best/best MCMC estimator (that is, for each method, find the best perf score, then among those, find the best code size) we get data like

  ---baseline heuristic had optimal perf score in 125 of 200 methods 62.50%; 
     best/base 0.984 base/none 0.940 best/none 0.925 best(size)/base 1.015 (8358 runs in 14713ms)
  ---baseline heuristic had optimal code size  in 91 of 200 methods 45.50%; 
     best/base 0.981 base/none 0.964 best/none 0.945 best(score)/base 0.995

So a policy that can magically get the best perf score (1.6%) and then the best code size has about a 1.5% size increase, and a policy that can magically get the best code size (1.9%) and then the best perf score will see about a 0.5% perf score decrease. So a bit less pessimistic than the above.

I would like to grow this into a full-fledged Pareto Frontier; what we have here are the endpoints. But it is not immediately obvious to me how to do that in aggregate; each method comes with its own set of tradeoffs. Some thoughts:

  • impose a per-method limit. This would likely need to be method relative. Say for each method, we find the best code size if we're willing to accept a perf score that is within N% of best; do this for varying Ns. Then aggregate across these to see what the overall code size impact is.
  • do something more ambitious, exploiting the different slopes of method tradeoff curves. Say we have two methods, and one can get a really good perf score but causes a large size increase, and the other's perf score is relatively insensitive to size: we optimize the first for score and the second for size... etc. Combinatorically this seems challenging.

@AndyAyersMS
Copy link
Member Author

I made some updates so that MCMC can track the pareto frontiers for the methods it explores. here's one such (score and size normalized by the current jit heuristic score/size).

image

Note these "curves" must pass through or below (1,1); here it passes through, meaning that we can either have smaller code or faster code but not both. The lines joining the points are fictional as the observations are discrete, but they help visualize the nature of tradeoff. Also note we're at the mercy of MCMC's exploration strategy; it may be we should be doing more extensive random sampling and a more thorough exploration would change the shape of the curve in interesting ways. Will have to experiment some here.

Broadening this out to a bigger set of methods, here are 200 randomly chosen method pareto frontiers:

image

Roughly speaking we see 3 classes of methods here:

  • ones like the above where can trade speed for size, but can't do better on both
  • ones where the current behavior is already speed and size optimal (these "curves" are single points at 1,1)
  • ones where we can improve both speed and size. These are methods where the curve lies entirely in the magic quadrant (lower left).

It would be interesting to see if there's some simple way to classify a method (based on what is known at the point where we do CSEs).

@AndyAyersMS
Copy link
Member Author

Looking at TP diff


?GetFeatures@CSE_HeuristicParameterized@@QEAAXPEAUCSEdsc@@PEAN@Z                                                          : 64893783  : NA           : 13.74% : +0.1574%
?gtSetEvalOrder@Compiler@@QEAAIPEAUGenTree@@@Z                                                                            : 34320493  : +4.42%       : 7.26%  : +0.0832%
?fgMorphSmpOp@Compiler@@AEAAPEAUGenTree@@PEAU2@PEAUMorphAddrContext@1@PEA_N@Z                                             : 33806774  : +5.24%       : 7.16%  : +0.0820%
?BuildChoices@CSE_HeuristicParameterized@@QEAAXAEAV?$ArrayStack@UChoice@CSE_HeuristicParameterized@@@@@Z                  : 31379648  : NA           : 6.64%  : +0.0761%
GenTreeVisitor<`Compiler::fgSetTreeSeq'::`2'::SetTreeSeqVisitor>::WalkTree                                                : 16013075  : +4.48%       : 3.39%  : +0.0388%
?push_back@?$vector@PEAUFlowEdge@@V?$allocator@PEAUFlowEdge@@@jitstd@@@jitstd@@QEAAXAEBQEAUFlowEdge@@@Z                   : 15835128  : +246.82%     : 3.35%  : +0.0384%
?fgMorphTree@Compiler@@QEAAPEAUGenTree@@PEAU2@PEAUMorphAddrContext@1@@Z                                                   : 13466755  : +3.30%       : 2.85%  : +0.0327%
?ConsiderCandidates@CSE_HeuristicParameterized@@UEAAXXZ                                                                   : 12455859  : NA           : 2.64%  : +0.0302%
GenTreeVisitor<`Compiler::gtFindLink'::`2'::FindLinkWalker>::WalkTree                                                     : 8460164   : +24.16%      : 1.79%  : +0.0205%
?fgMorphSmpOpOptional@Compiler@@AEAAPEAUGenTree@@PEAUGenTreeOp@@PEA_N@Z                                                   : 7817511   : +5.15%       : 1.65%  : +0.0190%
?StoppingPreference@CSE_HeuristicParameterized@@QEAANXZ                                                                   : 7024516   : NA           : 1.49%  : +0.0170%
?PerformCSE@CSE_HeuristicCommon@@UEAAXPEAVCSE_Candidate@@@Z                                                               : 6870688   : +19.69%      : 1.45%  : +0.0167%
?gtUpdateNodeOperSideEffects@Compiler@@QEAAXPEAUGenTree@@@Z                                                               : 6729731   : +5.32%       : 1.42%  : +0.0163%
?GreedyPolicy@CSE_HeuristicParameterized@@QEAAXXZ                                                                         : 5490710   : NA           : 1.16%  : +0.0133%

Both the cost of the heuristic and the cost of the extra CSEs show up here. Will focus on the heuristic cost as the actual policy will fluctuate as we tune the parameters.

@AndyAyersMS
Copy link
Member Author

Also likely running the greedy heuristic on optimize for size cases is a contributor. I am trying to create a size-optimized parameter set now; depending on how that goes I might decide to use that or initially disable the greedy heuristic for size cases.

Profiling showed that `GetFeatures` was a major factor in throughput. For the
most part the features of CSE candidates don't change as we perform CSEs, so
build in some logic to avoid recomputing the feature set unless there is some
evidence features have changed.

To avoid having to remove already performed candidates from the candidate vector
we now tag them them as `m_performed`l  these get ignored during subsequent processing,
and discarded if we ever recompute features.

This should cut the TP impact roughly in half, the remaining part seems to
largely be from doing more CSEs (which we hope will show some perf benefit).

Contributes to dotnet#92915.
@AndyAyersMS
Copy link
Member Author

AndyAyersMS commented Feb 26, 2024

TP did improve a bit...

Throughput diffs for windows/x64 ran on windows/x64

Overall

Collection Old PDIFF New PDIFF
benchmarks.run.windows.x64.checked.mch +0.85% +0.68%
benchmarks.run_pgo.windows.x64.checked.mch +0.73% +0.65%
benchmarks.run_tiered.windows.x64.checked.mch +0.62% +0.50%
coreclr_tests.run.windows.x64.checked.mch +1.77% + 1.47%
libraries.crossgen2.windows.x64.checked.mch +0.73% .. etc ...
libraries.pmi.windows.x64.checked.mch +1.00%
libraries_tests.run.windows.x64.Release.mch +0.79%
libraries_tests_no_tiered_compilation.run.windows.x64.Release.mch +0.56%
realworld.run.windows.x64.checked.mch +0.69%
smoke_tests.nativeaot.windows.x64.checked.mch +0.71%

@AndyAyersMS
Copy link
Member Author

Perf lab experiment is now running... here are some good examples
newplot - 2024-03-08T101720 835
newplot - 2024-03-08T103447 465
newplot - 2024-03-08T103639 729
newplot - 2024-03-08T104333 452

The cumulative score plot of (exp/base) shows a slight leftward shift. So seems like the net impact is an improvement. Need a bit more data though.

image

@AndyAyersMS
Copy link
Member Author

Some regressions too
newplot - 2024-03-08T133314 575

newplot - 2024-03-08T133149 251

@AndyAyersMS
Copy link
Member Author

Tail counts seem to be fairly even pretty... (recall this is diff/base ratio, so < 1 is an improvement, > 1 is a regression):

criteria count
< 0.80 13
< 0.90 42
< 0.95 146
> 1.05 150
> 1.10 45
> 1.20 13

Ideally, we'd see some leftward skew here, more big improvements than big regressions.

So in some respects, data is still looking mostly like noise... but again we only have 3-4 days worth of numbers right now, so let's look again next week when there's more.

@AndyAyersMS
Copy link
Member Author

With two weeks of data, plotting the ratio of 10th percentiles, new/old (so lower is better)

image

There is still a slight leftward skew, though again the tails seem fairly evenly matched. Geomean across all is 1.00014 so again no clear persistent improvement.

Copy link
Contributor

Draft Pull Request was automatically closed for 30 days of inactivity. Please let us know if you'd like to reopen it.

@github-actions github-actions bot locked and limited conversation to collaborators May 23, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants