-
Couldn't load subscription status.
- Fork 5.2k
Description
Goals
The main goal of my study was to ensure that we ship .NET 5.0 without any performance regressions and validate whether in the near future we can fully rely on the regressions auto-filing bot written by @DrewScoggins.
My other goal was to get .NET Library Team members involved and keep on growing the performance culture.
#tl;dr The bot is doing a great job in detecting regressions. Most serious regressions have been already fixed, however a few investigations are still in progress.
Methodology (and how it evolved)
In 2018 I had the pleasure to review @AndreyAkinshin "Pro .NET Benchmarking" book. The "Statistics for Performance Engineers" and "Performance Analysis and Performance Testing" chapters inspired me to implement a small tool called Results Comparer. The tool uses the Mann-Whitney U statistical test to detect performance regressions in results exported by BenchmarkDotNet. It's being used (or at least it should) as part of our benchmarking workflow to prevent introducing regressions to .NET.
In 2019 I was asked by @danmosemsft to verify .NET Core 3.0 performance. Initially, I’ve run all the microbenchmarks from dotnet/performance repository using a single machine with dual boot for Windows 10 and Ubuntu 18.04 x64 and used the Results Comparer to find regressions. It very quickly turned out that such a sample was way too small to make sure that we don’t have any regressions. Some benchmarks were simply unstable, some architectures like ARM and ARM64 were simply not covered. Other Linux distros and CPU families were also not covered.
Then I’ve run the benchmarks on all the PCs, laptops, and VMs that I could access. But I was still missing AMD and ARM results, so I've asked @tannergooding and @BruceForstall for help. @tannergooding has run the benchmarks on all his AMD machines. @BruceForstall has provided me access to a document that explains how to use ARM machines owned by the JIT Team. This turned out to be an invaluable help as I've used these machines many, many times. Including this year during the 5.0 investigation.
After having enough samples to cover our matrix of supported OSes and architectures, I’ve built a simple console app on top of ResultsComparer (source code available here). The tool uses the very same statistical test to detect regressions, aggregates the results from all different configurations, and sorts them from the biggest regression to the biggest improvement.
Such approach allows for very quick identification of regressions of all kinds:
- affecting every configuration
System.Linq.Tests.Perf_Enumerable.FirstWithPredicate_LastElementMatches(input: IOrderedEnumerable)
| Result | Base | Diff | Ratio | Operating System | Bit |
|---|---|---|---|---|---|
| Slower | 570.88 | 3069.76 | 0.19 | Windows 10.0.19041.388 | X64 |
| Slower | 610.20 | 3674.19 | 0.17 | Windows 10.0.18363.959 | X64 |
| Slower | 598.37 | 3519.26 | 0.17 | Windows 10.0.18363.959 | X64 |
| Slower | 700.86 | 4238.85 | 0.17 | Windows 10.0.19041.450 | X64 |
| Slower | 583.19 | 3538.60 | 0.16 | Windows 10.0.19041.450 | X64 |
| Slower | 546.58 | 3015.23 | 0.18 | Windows 10.0.19042 | X64 |
| Slower | 665.53 | 3776.10 | 0.18 | Windows 10.0.19041.450 | X64 |
| Slower | 515.15 | 3162.05 | 0.16 | Windows 10.0.19041.450 | X64 |
| Slower | 626.94 | 3928.55 | 0.16 | ubuntu 18.04 | X64 |
| Slower | 630.90 | 4196.01 | 0.15 | manjaro | X64 |
| Slower | 813.80 | 4605.57 | 0.18 | pop 20.04 | X64 |
| Slower | 608.59 | 3587.44 | 0.17 | alpine 3.11 | X64 |
| Slower | 615.67 | 3390.01 | 0.18 | ubuntu 18.04 | X64 |
| Slower | 2148.33 | 10335.71 | 0.21 | ubuntu 16.04 | Arm64 |
| Slower | 2183.77 | 10620.53 | 0.21 | ubuntu 16.04 | Arm64 |
| Slower | 2163.67 | 10815.16 | 0.20 | ubuntu 16.04 | Arm64 |
| Slower | 1176.33 | 11641.04 | 0.10 | ubuntu 18.04 | Arm64 |
| Slower | 1550.48 | 5183.74 | 0.30 | ubuntu 20.04 | Arm64 |
| Slower | 568.67 | 3637.59 | 0.16 | Windows 10.0.18363.959 | X86 |
| Slower | 664.86 | 4576.24 | 0.15 | Windows 10.0.19041.450 | X86 |
| Slower | 972.74 | 8054.46 | 0.12 | Windows 10.0.18363.1016 | Arm |
| Slower | 790.15 | 5171.92 | 0.15 | macOS Catalina 10.15.6 | X64 |
| Slower | 668.62 | 4153.54 | 0.16 | macOS Catalina 10.15.6 | X64 |
| Slower | 743.69 | 4727.58 | 0.16 | macOS Mojave 10.14.5 | X64 |
- affecting specific OS families (Windows, Unix)
System.Globalization.Tests.StringSearch.IsPrefix_DifferentFirstChar(Options: (en-US, IgnoreSymbols, False))
| Result | Base | Diff | Ratio | Operating System | Bit |
|---|---|---|---|---|---|
| Slower | 53.24 | 26589.31 | 0.00 | Windows 10.0.19041.388 | X64 |
| Slower | 65.47 | 28371.93 | 0.00 | Windows 10.0.18363.959 | X64 |
| Slower | 63.89 | 27952.39 | 0.00 | Windows 10.0.18363.959 | X64 |
| Slower | 75.24 | 35910.74 | 0.00 | Windows 10.0.19041.450 | X64 |
| Slower | 67.29 | 55198.94 | 0.00 | Windows 10.0.19041.450 | X64 |
| Slower | 58.36 | 31008.73 | 0.00 | Windows 10.0.19042 | X64 |
| Slower | 70.38 | 34632.87 | 0.00 | Windows 10.0.19041.450 | X64 |
| Slower | 58.92 | 27533.16 | 0.00 | Windows 10.0.19041.450 | X64 |
| Same | 24197.26 | 24316.40 | 1.00 | ubuntu 18.04 | X64 |
| Same | 23317.93 | 23585.42 | 0.99 | manjaro | X64 |
| Same | 30855.66 | 30176.99 | 1.02 | pop 20.04 | X64 |
| Same | 29081.88 | 28590.29 | 1.02 | alpine 3.11 | X64 |
| Same | 23929.07 | 23728.33 | 1.01 | ubuntu 18.04 | X64 |
| Same | 51918.86 | 51256.87 | 1.01 | ubuntu 16.04 | Arm64 |
| Same | 51674.77 | 51693.86 | 1.00 | ubuntu 16.04 | Arm64 |
| Same | 51690.93 | 52015.88 | 0.99 | ubuntu 16.04 | Arm64 |
| Same | 61071.92 | 43711.17 | 1.40 | ubuntu 18.04 | Arm64 |
| Faster | 43870.66 | 26020.13 | 1.69 | ubuntu 20.04 | Arm64 |
| Slower | 78.42 | 36208.27 | 0.00 | Windows 10.0.18363.959 | X86 |
| Slower | 88.01 | 42312.37 | 0.00 | Windows 10.0.19041.450 | X86 |
| Slower | 104.29 | 57622.86 | 0.00 | Windows 10.0.18363.1016 | Arm |
| Same | 38089.02 | 40079.68 | 0.95 | macOS Catalina 10.15.6 | X64 |
| Same | 32208.09 | 32537.00 | 0.99 | macOS Catalina 10.15.6 | X64 |
| Same | 32575.17 | 32782.69 | 0.99 | macOS Mojave 10.14.5 | X64 |
- affecting specific Linux distros
System.Threading.Tests.Perf_CancellationToken.Cancel
| Result | Base | Diff | Ratio | Operating System | Bit |
|---|---|---|---|---|---|
| Same | 116.42 | 120.28 | 0.97 | Windows 10.0.19041.388 | X64 |
| Same | 148.25 | 146.53 | 1.01 | Windows 10.0.18363.959 | X64 |
| Same | 144.37 | 144.09 | 1.00 | Windows 10.0.18363.959 | X64 |
| Same | 154.82 | 151.57 | 1.02 | Windows 10.0.19041.450 | X64 |
| Same | 134.57 | 133.40 | 1.01 | Windows 10.0.19041.450 | X64 |
| Same | 122.52 | 119.39 | 1.03 | Windows 10.0.19042 | X64 |
| Same | 154.48 | 150.92 | 1.02 | Windows 10.0.19041.450 | X64 |
| Same | 128.87 | 122.90 | 1.05 | Windows 10.0.19041.450 | X64 |
| Same | 169.50 | 168.46 | 1.01 | ubuntu 18.04 | X64 |
| Faster | 171.67 | 155.11 | 1.11 | manjaro | X64 |
| Same | 179.54 | 175.17 | 1.02 | pop 20.04 | X64 |
| Slower | 146.39 | 203.94 | 0.72 | alpine 3.11 | X64 |
| Same | 179.39 | 180.75 | 0.99 | ubuntu 18.04 | X64 |
| Same | 1068.08 | 1029.35 | 1.04 | ubuntu 16.04 | Arm64 |
| Same | 1066.73 | 1056.79 | 1.01 | ubuntu 16.04 | Arm64 |
| Same | 1111.72 | 1037.54 | 1.07 | ubuntu 16.04 | Arm64 |
| Same | 751.74 | 622.83 | 1.21 | ubuntu 18.04 | Arm64 |
| Faster | 675.51 | 318.18 | 2.12 | ubuntu 20.04 | Arm64 |
| Same | 258.80 | 257.15 | 1.01 | Windows 10.0.18363.959 | X86 |
| Same | 194.61 | 192.96 | 1.01 | Windows 10.0.19041.450 | X86 |
| Same | 486.93 | 508.05 | 0.96 | Windows 10.0.18363.1016 | Arm |
| Same | 200.25 | 203.78 | 0.98 | macOS Catalina 10.15.6 | X64 |
| Same | 168.62 | 163.47 | 1.03 | macOS Catalina 10.15.6 | X64 |
| Same | 174.95 | 177.88 | 0.98 | macOS Mojave 10.14.5 | X64 |
- affecting specific CPU families
System.Buffers.Text.Tests.Base64EncodeDecodeInPlaceTests.Base64EncodeInPlace(NumberOfBytes: 200000000)
| Result | Base | Diff | Ratio | Operating System | Bit | Processor Name |
|---|---|---|---|---|---|---|
| Same | 125616750.00 | 125476550.00 | 1.00 | Windows 10.0.19041.388 | X64 | AMD Ryzen 9 3900X |
| Same | 161388400.00 | 156493500.00 | 1.03 | Windows 10.0.18363.959 | X64 | Intel Xeon CPU E5-1650 v4 3.60GHz |
| Same | 154933500.00 | 154730800.00 | 1.00 | Windows 10.0.18363.959 | X64 | Intel Xeon CPU E5-1650 v4 3.60GHz |
| Same | 180481800.00 | 180129900.00 | 1.00 | Windows 10.0.19041.450 | X64 | Intel Core i7-5557U CPU 3.10GHz (Broadwell) |
| Slower | 161742300.00 | 211160300.00 | 0.77 | Windows 10.0.19041.450 | X64 | Intel Core i7-6700 CPU 3.40GHz (Skylake) |
| Same | 152928600.00 | 150232700.00 | 1.02 | Windows 10.0.19042 | X64 | Intel Core i7-7700 CPU 3.60GHz (Kaby Lake) |
| Same | 206708750.00 | 206860050.00 | 1.00 | Windows 10.0.19041.450 | X64 | Intel Core i7-8650U CPU 1.90GHz (Kaby Lake R) |
| Slower | 140924300.00 | 185228400.00 | 0.76 | Windows 10.0.19041.450 | X64 | Intel Core i7-8700 CPU 3.20GHz (Coffee Lake) |
| Same | 154948321.00 | 154788579.50 | 1.00 | ubuntu 18.04 | X64 | Intel Xeon CPU E5-1650 v4 3.60GHz |
| Same | 175860282.50 | 163007313.50 | 1.08 | manjaro | X64 | Intel Core i7-4771 CPU 3.50GHz (Haswell) |
| Slower | 199713880.00 | 255270486.50 | 0.78 | pop 20.04 | X64 | Intel Core i7-6600U CPU 2.60GHz (Skylake) |
| Same | 151256100.00 | 168661900.00 | 0.90 | alpine 3.11 | X64 | Intel Core i7-7700 CPU 3.60GHz (Kaby Lake) |
| Same | 171229200.00 | 165843050.00 | 1.03 | ubuntu 18.04 | X64 | Intel Core i7-7700 CPU 3.60GHz (Kaby Lake) |
| Same | 503785101.00 | 505992400.50 | 1.00 | ubuntu 16.04 | Arm64 | Unknown processor |
| Same | 503901205.00 | 506190175.00 | 1.00 | ubuntu 16.04 | Arm64 | Unknown processor |
| Same | 504131772.50 | 506220395.00 | 1.00 | ubuntu 16.04 | Arm64 | Unknown processor |
| Same | 473629200.00 | 541631800.00 | 0.87 | ubuntu 18.04 | Arm64 | Unknown processor |
| Same | 331381500.00 | 333779500.00 | 0.99 | ubuntu 20.04 | Arm64 | Unknown processor |
| Same | 246876150.00 | 247010200.00 | 1.00 | Windows 10.0.18363.959 | X86 | Intel Xeon CPU E5-1650 v4 3.60GHz |
| Same | 290036150.00 | 289409500.00 | 1.00 | Windows 10.0.19041.450 | X86 | Intel Core i7-5557U CPU 3.10GHz (Broadwell) |
| Same | 418007450.00 | 415404450.00 | 1.01 | Windows 10.0.18363.1016 | Arm | Microsoft SQ1 3.0 GHz |
| Same | 204196936.50 | 204410652.50 | 1.00 | macOS Catalina 10.15.6 | X64 | Intel Core i5-4278U CPU 2.60GHz (Haswell) |
| Same | 176763730.00 | 175647563.50 | 1.01 | macOS Catalina 10.15.6 | X64 | Intel Core i7-4870HQ CPU 2.50GHz (Haswell) |
| Same | 180812724.00 | 184849205.00 | 0.98 | macOS Mojave 10.14.5 | X64 | Intel Core i7-5557U CPU 3.10GHz (Broadwell) |
Using the tool had one major flaw: it was not automated and hence we were finding out about the regressions only when we searched for them.
This has been recognized and a new project has been started. In 2020 @DrewScoggins started implementing a GitHub bot that would be using the data gathered from performance lab (a set of machines owned by .NET Performance Team) microbenchmark runs to detect and auto-file the regressions. So far the bot was reporting new issues in a dedicated repository and once a week the workgroup led by @DrewScoggins that consisted of @AndyAyersMS, @kunalspathak, @tannergooding any myself was going through the list and triaging the issues. Issues that were seemed as actual regressions were labeled as Needs Transfer and were later moved by @DrewScoggins to the runtime repo.
A few weeks ago we were getting close to "code freeze" for .NET 5 and I have asked myself a question: are we sure that the bot has reported all possible regressions for all the supported OS versions?
The bot is using different statistical methods to detect regressions and so far it has been enabled only for Windows 10 x64, Ubuntu 18.04 x64, and Windows 10 x86. So I've decided to spend some time and use the old tool that I wrote to verify it. To increase the sample size and get other .NET Libraries Team members involved, I've simply asked the Team to run the benchmarks and share the results with me.
Running the performance repo microbenchmarks against the latest .NET Core SDK is super easy thanks to a python script implemented by @jorive. The script downloads the right SDK and starts benchmarking with cleared environment variables.
git clone https://github.com/dotnet/performance.git
python3 ./performance/scripts/benchmarks_ci.py -f netcoreapp3.1 netcoreapp5.0 --filter '*'Data
The data I've received from the .NET Libraries Team members allowed me a big part of the entire matrix of the supported configurations:
| Operating System | Arch | Processor Name | Provided by |
|---|---|---|---|
| Windows 10.0.19041.388 | X64 | AMD Ryzen 9 3900X | @tannergooding |
| Windows 10.0.18363.959 | X64 | Intel Xeon CPU E5-1650 v4 3.60GHz | @adamsitnik |
| Windows 10.0.19041.450 | X64 | Intel Core i7-5557U CPU 3.10GHz (Broadwell) | @adamsitnik |
| Windows 10.0.19041.450 | X64 | Intel Core i7-6700 CPU 3.40GHz (Skylake) | @GrabYourPitchforks |
| Windows 10.0.19042 | X64 | Intel Core i7-7700 CPU 3.60GHz (Kaby Lake) | @danmosemsft |
| Windows 10.0.19041.450 | X64 | Intel Core i7-8650U CPU 1.90GHz (Kaby Lake R) | @jeffhandley |
| Windows 10.0.19041.450 | X64 | Intel Core i7-8700 CPU 3.20GHz (Coffee Lake) | @jeffhandley |
| ubuntu 18.04 | X64 | Intel Xeon CPU E5-1650 v4 3.60GHz | @adamsitnik |
| manjaro | X64 | Intel Core i7-4771 CPU 3.50GHz (Haswell) | @ManickaP |
| pop 20.04 | X64 | Intel Core i7-6600U CPU 2.60GHz (Skylake) | @carlossanlop |
| alpine 3.11 (WSL2) | X64 | Intel Core i7-7700 CPU 3.60GHz (Kaby Lake) | @danmosemsft |
| ubuntu 18.04 (WSL2) | X64 | Intel Core i7-7700 CPU 3.60GHz (Kaby Lake) | @danmosemsft |
| ubuntu 16.04 | Arm64 | Qualcomm Centriq | @adamsitnik |
| ubuntu 18.04 (WSL2) | Arm64 | Microsoft SQ1 3.0 GHz (Surface Pro X) | @carlossanlop |
| ubuntu 20.04 (WSL2) | Arm64 | Microsoft SQ1 3.0 GHz (Surface Pro X) | @pgovind |
| Windows 10.0.18363.959 | X86 | Intel Xeon CPU E5-1650 v4 3.60GHz | @adamsitnik |
| Windows 10.0.19041.450 | X86 | Intel Core i7-5557U CPU 3.10GHz (Broadwell) | @adamsitnik |
| Windows 10.0.18363.1016 | Arm | Microsoft SQ1 3.0 GHz (Surface Pro X) | @adamsitnik |
| macOS Catalina 10.15.6 | X64 | Intel Core i5-4278U CPU 2.60GHz (Haswell) | @jeffhandley |
| macOS Catalina 10.15.6 | X64 | Intel Core i7-4870HQ CPU 2.50GHz (Haswell) | @carlossanlop |
| macOS Mojave 10.14.5 | X64 | Intel Core i7-5557U CPU 3.10GHz (Broadwell) | @adamsitnik |
Everyone interested can download the data from here. The full report generated by the tool is available here.
Moreover, the full historical data turned out to be extremely useful. I've used it every time I was not sure whether something was a regression or just unstable|multimodal benchmark:
- Windows 10 x64
- Ubuntu 18.04 x64
- Ubuntu 18.04 ARM64 (added yesterday)
- Windows 10 x86
Regressions
Already fixed
-
System.Collections.Contains*,System.Memory.SequenceReader.TryReadTo,System.Text.Json.Tests.Perf_Segment.ReadSingleSegmentSequenceByN- was a 32 bit issue only (both
x86andARM) - detected by the bot, reported in [Perf -196%] System.Collections.ContainsTrue<Int32> (6) DrewScoggins/performance-2#910 (comment)
- confirmed: [Perf -196%] System.Collections.ContainsTrue<Int32> (6) DrewScoggins/performance-2#910 (comment)
- transffered to runtime repo: [Perf -196%] System.Collections.ContainsTrue<Int32> (6) #41167
- fixed in Fix perf regression in IntPtr operators on 32-bit platforms #41198
- backported to 5.0 in [release/5.0] Fix perf regression in IntPtr operators on 32-bit platforms #41254
- was a 32 bit issue only (both
-
System.Collections.CtorGivenSize<Int32>.Array(Size: 512)- specific to Apline only
- created an issue Performance regression: 6x slower array allocation on Alpine #41398
- confirmed by @jkotas to be not WSL specific, but a much bigger Alpine perf problem
- it has shown that an increased number of Gen 0 collections is a valuable metric to detect regressions
- fixed in Fix reading cpu cache size for Alpine(musl) #41532
- backported to 5.0 [release/5.0] Fix reading cpu cache size for Alpine(musl) #41547
- created Create a unit test for PAL_GetLogicalProcessorCacheSizeFromOS #41708 to add unit tests that ensure that this problem is not coming back
-
System.Numerics.Tests.Perf_Quaternion.ConjugateandSystem.Numerics.Tests.Perf_Quaternion.Negat*- not reported by the bot because it's a brand new benchmark and we did not have historical data at the time of my investigation
- issue created: Performance regressions in Quaternion.Conjugate and Quaternion.Negate #41738
- fixed in Marking Matrix3x2, Matrix4x4, Plane, and Quaternion as Intrinsic #41829
- backported to 5.0-rc2 in [release/5.0-rc2] Marking Matrix3x2, Matrix4x4, Plane, and Quaternion as Intrinsic #41885
-
Directory.EnumerateFiles- not reported by the bot, most probably because it was a very fresh regression
- issue created: [Unix] Potential performance regression in Directory.EnumerateFiles #41739
- fixed in [Unix] Potential performance regression in Directory.EnumerateFiles #41739
- backported to 5.0-rc2 in [release/5.0-rc2] Revert #40641 #41820
-
ByteMark.BenchIDEAEncryption- not reported by the bot, most probably because it was a very fresh regression
- issue created: Performance regression in ByteMark.BenchIDEAEncryption #41677
- fixed in Alternative fix for folding of *(typ*)&lclVar for small types #40607 #40871
- backported to 5.0-rc2 in [release/5.0-rc2] Fix for folding of *(typ*)&lclVar for small types #41838
-
System.Text.Perf_Utf8Encoding- not detected by the bot because it was not enabled for ARM yet
- issue created: [ARM64] Performance regression: Utf8Encoding #41699
- fixed in Temporarily disable arm64 intrinsics in UTF-16 validation code paths #42052
- backported to 5.0-rc2 in [release/5.0-rc2] Disable arm64 intrinsics in UTF-16 validation code paths #42064
Investigation in progress
-
System.Memory.Slice- not detected by the bot because it was not enabled for ARM yet
- seems to be ARM64-specific, created an issue [ARM64] Possible perf regression: slicing #41704
- investigation is in progress
-
PerfLabTests.CastingPerf2.CastingPerf.IntObj- not detected by the bot because it was not enabled for ARM yet
- seems to be ARM64-specific, created an issue [ARM64] Performance regression: PerfLabTests.CastingPerf2.CastingPerf.IntObj #41706
- investigation is in progress
By design or Acceptable
- ICU-related regressions
System.Globalization.Tests.StringSearch: detected by the bot, reported in [Perf -1,796%] System.Globalization.Tests.StringSearch (33) #37819System.Memory.ReadOnlySpan.IndexOfString: detected by the bot, reported in [Perf -14%] System.Memory.ReadOnlySpan.IndexOfString (2) #39724System.Globalization.Tests.Perf_DateTimeCultureInfo.Parse(culturestring: ja): detected by the bot, reported in [Perf - 10-20x regression] System.Globalization.Tests.Perf_DateTimeCultureInfo.Parse in ja #37807System.Globalization.Tests.StringEquality: detected by the bot, reported in [Perf -97%] System.Globalization.Tests.StringEquality (8) #39038- I've created one uber issue to track all of them in one place: List of performance regressions caused by switching to ICU #40942
OrdinalIgnoreCasehas been optimized in Port Ordinal Ignore Case Optimization changes #40962- TODO: doc update still required
-
System.Linq.Tests.Perf_Enumerable.FirstWithPredicate_LastElementMatches(input: IOrderedEnumerable)- detected by the bot, reported in [Perf -492%] System.Linq.Tests.Perf_Enumerable.FirstWithPredicate_LastElementMatches #39032
- closed, by design: removed the
O(N log N)cost of theOrderBy[Perf -492%] System.Linq.Tests.Perf_Enumerable.FirstWithPredicate_LastElementMatches #39032 (comment)
-
System.Collections.Tests.Perf_BitArray.*(Size: 4)- detected by the bot, reported in [Perf -118%] System.Collections.Tests.Perf_BitArray for small inputs (3) #37813
- closed, by design: introduction of vectorization has increased the cost of operations for small inputs: [Perf -118%] System.Collections.Tests.Perf_BitArray for small inputs (3) #37813 (comment)
-
System.Threading.Tests.Perf_Thread.GetCurrentProcessorId- detected by the bot, reported in [Perf -35%] System.Threading.Tests.Perf_Thread.GetCurrentProcessorId #37804
- closed, by design: precision was improved at a cost of acceptable minor perf regression: [Perf -35%] System.Threading.Tests.Perf_Thread.GetCurrentProcessorId #37804 (comment)
-
PerfLabTests.CastingPerf.CheckIsInstAnyIsInterfaceNo,PerfLabTests.CastingPerf.CheckObjIsInterfaceNo- detected by the bot, reported in [Perf -29%] PerfLabTests.CastingPerf (2) #37803
- closed, by design: known tradeoff: [Perf -29%] PerfLabTests.CastingPerf (2) #37803 (comment)
-
System.Net.NetworkInformation.Tests.PhysicalAddressTests.PAShort- detected by the bot, reported in [Perf -19%] System.Net.NetworkInformation.Tests.PhysicalAddressTests.PAShort #39720
- closed, acceptable for improved code reuse [Perf -19%] System.Net.NetworkInformation.Tests.PhysicalAddressTests.PAShort #39720 (comment)
- benchmark for 1 byte removed, added 6 bytes in remove PAShort benchmark that uses 1 byte long input and add a "Medim" that consists of 6 bytes performance#1490
-
System.Numerics.Tests.Perf_Vector*.GetHashCodeBenchmark- detected by the bot, reported in [Perf -98%] System.Numerics.Tests.Perf_Vector2.GetHashCodeBenchmark #39035 and [Perf -53%] System.Numerics.Tests.Perf_Vector4.GetHashCodeBenchmark #39029
- closed, "it should not be used" [Perf -53%] System.Numerics.Tests.Perf_Vector4.GetHashCodeBenchmark #39029 (comment)
-
System.Net.Primitives.Tests.CredentialCacheTests.ForEach(uriCount: 0, hostPortCount: 0)- detected by the bot, reported in [Perf -17%] System.Net.Primitives.Tests.CredentialCacheTests (2) DrewScoggins/performance-2#510
- confirmed: [Perf -17%] System.Net.Primitives.Tests.CredentialCacheTests (2) DrewScoggins/performance-2#510 (comment)
- awaiting the transfer to runtime repo. Most probably a by-design regression.
Moved to 6.0
-
System.Tests.Perf_Char.GetUnicodeCategory(c: '?')- detected and reported by the bot in [Perf -11%] System.Tests.Perf_Char.GetUnicodeCategory DrewScoggins/performance-2#574, I've created Minor regression in System.Tests.Perf_Char.GetUnicodeCategory for non-ascii characters #41107
- minor regression for non-ascii characters, moved to 6.0
-
PerfLabTests.StackWalk.Walk- detected by the bot and reported in [Perf -55%] PerfLabTests.StackWalk.Walk #39115
- confirmed in [Perf -55%] PerfLabTests.StackWalk.Walk #39115 (comment)
- specific to everything that is not Windows x64, rather not critical -> moved to 6.0: [Perf -55%] PerfLabTests.StackWalk.Walk #39115 (comment)
-
System.Tests.Perf_String.Replace_Char(text: "Hello", oldChar: 'l', newChar: '!')- reported in [Perf -26%] System.Tests.Perf_String (4) #37816
- confirmed in [Perf -26%] System.Tests.Perf_String (4) #37816 (comment)
- moved to 6.0
-
System.Text.Perf_Utf8String.IsAscii(Input: EnglishAllAscii)- not reported by the bot because it was a brand new benchmark and we did not have historical data at the time of my investigation
- issue created: Performance regression: Utf8String.IsAscii (x86 only) #41388
- moved to 6.0 as
Utf8Stringis still only experimental
-
System.Text.Encodings.Web.Tests.Perf_Encoders.EncodeUtf8- not reported by the bot because it was a brand new benchmark and we did not have historical data at the time of my investigation
- issue created: Performance regression: System.Text.Encodings.Web.Tests.Perf_Encoders.EncodeUtf8 #41104
- moved to 6.0
Unstable or multimodal benchmarks
There was of course more of them, here are the ones that I've noted to use as Contract Tests in the near future (to reduce the noise produced by the bot):
System.Buffers.Tests.RentReturnArrayPoolTests<Byte>.ProducerConsumer- detected by the bot, reported in [Perf -138%] System.Buffers.Tests.RentReturnArrayPoolTests<Byte>.ProducerConsumer #39031
- asked for historical data to verify if it's multimodal or not [Perf -138%] System.Buffers.Tests.RentReturnArrayPoolTests<Byte>.ProducerConsumer #39031 (comment)
- thanks to historical data provided it was possible to tell that it's unstable for x64 and bimodal for x86: [Perf -138%] System.Buffers.Tests.RentReturnArrayPoolTests<Byte>.ProducerConsumer #39031 (comment)
System.Memory.ReadOnlySequence.Slice_Repeat_StartPosition_And_EndPosition(Segment: Multiple)- quite unstable benchmark, I've verified that 5.0 codegen is better
PerfLabTests.BlockCopyPerf.CallBlockCopy- detected by the bot, reported in [Perf -47%] PerfLabTests.BlockCopyPerf.CallBlockCopy #37808
- copying 0 elements does not add value: [Perf -47%] PerfLabTests.BlockCopyPerf.CallBlockCopy #37808 (comment)
- test case for copying 0 elements removed in measuring the performance of copying of 0 elements does not add value performance#1465
- closed as unstable based on full historical data: [Perf -47%] PerfLabTests.BlockCopyPerf.CallBlockCopy #37808 (comment)
System.Tests.Perf_String.Trim_CharArr(s: "Test", c: [' ', ' '])- multimodal benchmark, needs a rewrite as stated long time ago: Performance regression: string.Trim #13135
System.Threading.Tests.Perf_Interlocked.CompareExchange_long- the benchmark typically reports
10ns, but sometimesx100that. Only forx86. I need logs to verify whether it's a BDN bug or not. - issue created CompareExchange_long benchmark sometimes reports very long execution time on x86 performance#1497
- the benchmark typically reports
System.Memory.Span<Int32>.IndexOfValue(Size: 512)- reported in [Perf -35%] System.Memory.Span<Char>.IndexOfValue #39722
- confirmed that it was due to code alignment change in [Perf -35%] System.Memory.Span<Char>.IndexOfValue #39722 (comment)
Benchstone.BenchI.Fib.Test- perfectly multimodal, great example for a contract test
Summary
- The bot has reported all major performance issues for the configurations that it was enabled for (Windows x64, x86, and Ubuntu x64). Great work @DrewScoggins!
- The full historical data turned out to be extremely useful to exclude all false positives for multimodal and unstable benchmarks.
- We have missed one important x86 bug during triaging (human error), but it got discovered during the study ([Perf -196%] System.Collections.ContainsTrue<Int32> (6) #41167 (comment)). To avoid such problems in the future and to enable the bot in the runtime repo, the noise of the bot needs to be reduced. Currently, it's quite high, mostly due to the multimodal nature of the benchmarks.
- The study has detected relatively many new ARM64 perf problems at a late stage of the release. The sooner we enable the bot for ARM64, the better. Moreover, we should be more frequently asking for ARM64 results when reviewing big changes that affect the performance of frequently used features (like sorting the arrays).
- The study has shown that measuring the performance of
GNU libcbased Linux distros like Ubuntu is not enough to detectmusl libcspecific regressions. We should consider adding Alpine runs to the perf lab. - This time no important issues specific to
macOSand different CPU families were discovered. It has proven that the perf lab has good hardware coverage. - The Alpine regression has shown that an increased number of Gen 0 collections can be a very valuable metric to detect regressions. We should consider extending the bot to use it.
Big thanks to everyone involved!