Description
If {Workload|Overhead}Action{No}Unroll
are not optimized, and the benchmark method allocates a lot, the Tier0 versions of these methods may retain much more heap data than the optimized versions. This can completely change benchmark results.
Current behavior for CLR is that these methods are always eagerly optimized, because they all contain loops and we either do not have tiered compilation (full framework, pre-3.0 core) or have COMPlus_TC_QuickJitForLoops=1
(core 3.0 - core 6.0).
For .NET 7 we plan to change the default to COMPlus_TC_QuickJitForLoops=0
and this leads to these methods not being optimized and significant behavior changes in roughly 50 or so benchmarks on github.com/dotnet/performance.
To preserve current behavior, these methods should always be optimized when running against .NET Core versions that can change tiering strategy. This can be done by adding [MethodImpl(MethodImplOptions.AggressiveOptimziation)]
which is available in .Net Core 3.0 and up.
Alternatively one could consider restructuring things so that the benchmark delegate is always called from a method that doesn't have loops. This seems more disruptive.
See dotnet/performance#2214 (comment) and following comments for context.
cc @adamsitnik