-
Notifications
You must be signed in to change notification settings - Fork 5k
JIT: Always assign edge likelihoods in fgIncorporateProfileData #98054
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch Issue DetailsPart of #93020. Currently, if profile data isn't available or reliable, the JIT won't assign edge likelihoods. We have the ability to come up with reasonable edge likelihoods using a block's number of successors in profile synthesis, so do so when incorporating profile data. (This change had a lot of churn locally, and I think we can do this heuristic-driven likelihoods fixup more cheaply, i.e. skip the DFS computation and loop search, replace the
|
…ect behavior on overshift (dotnet#98001) * Ensure that constant folding for SIMD shifts on xarch follow the correct behavior on overshift * Ensure we test Sse2.IsSupported
Diff results for #98054Throughput diffsThroughput diffs for linux/arm64 ran on linux/x64Overall (+0.13% to +2.81%)
MinOpts (-0.31% to +5.68%)
FullOpts (+0.16% to +2.82%)
Throughput diffs for linux/x64 ran on linux/x64Overall (+0.21% to +2.70%)
MinOpts (+0.49% to +9.74%)
FullOpts (+0.18% to +2.96%)
Throughput diffs for osx/arm64 ran on linux/x64Overall (+0.08% to +2.71%)
MinOpts (-0.25% to +5.64%)
FullOpts (+0.29% to +2.84%)
Throughput diffs for windows/arm64 ran on linux/x64Overall (+0.08% to +2.58%)
MinOpts (-0.71% to +5.68%)
FullOpts (+0.11% to +2.58%)
Throughput diffs for windows/x64 ran on linux/x64Overall (-0.65% to +2.65%)
MinOpts (+0.45% to +10.18%)
FullOpts (-0.85% to +2.65%)
Details here |
Diff results for #98054Assembly diffsAssembly diffs for linux/arm64 ran on windows/x64Diffs are based on 1,536,730 contexts (368,644 MinOpts, 1,168,086 FullOpts). MISSED contexts: base: 316 (0.02%), diff: 76,964 (4.77%) Overall (+1,183,860 bytes)
MinOpts (-16 bytes)
FullOpts (+1,183,876 bytes)
Assembly diffs for linux/x64 ran on windows/x64Diffs are based on 1,546,466 contexts (360,162 MinOpts, 1,186,304 FullOpts). MISSED contexts: base: 265 (0.02%), diff: 77,380 (4.77%) Overall (+2,454,719 bytes)
MinOpts (-11 bytes)
FullOpts (+2,454,730 bytes)
Assembly diffs for osx/arm64 ran on windows/x64Diffs are based on 1,660,469 contexts (561,303 MinOpts, 1,099,166 FullOpts). MISSED contexts: base: 227 (0.01%), diff: 76,048 (4.38%) Overall (+1,273,248 bytes)
MinOpts (-168 bytes)
FullOpts (+1,273,416 bytes)
Assembly diffs for windows/arm64 ran on windows/x64Diffs are based on 1,401,964 contexts (263,527 MinOpts, 1,138,437 FullOpts). MISSED contexts: base: 247 (0.02%), diff: 78,791 (5.32%) Overall (+1,213,456 bytes)
MinOpts (-16 bytes)
FullOpts (+1,213,472 bytes)
Assembly diffs for windows/x64 ran on windows/x64Diffs are based on 1,915,694 contexts (587,569 MinOpts, 1,328,125 FullOpts). MISSED contexts: base: 373 (0.02%), diff: 87,176 (4.35%) Overall (+2,646,963 bytes)
MinOpts (-23 bytes)
FullOpts (+2,646,986 bytes)
Details here Assembly diffs for linux/arm ran on windows/x86Diffs are based on 1,383,559 contexts (345,734 MinOpts, 1,037,825 FullOpts). MISSED contexts: base: 54,343 (3.61%), diff: 121,291 (8.06%) Overall (+1,422,504 bytes)
MinOpts (-4 bytes)
FullOpts (+1,422,508 bytes)
Assembly diffs for windows/x86 ran on windows/x86Diffs are based on 1,529,353 contexts (327,626 MinOpts, 1,201,727 FullOpts). MISSED contexts: base: 799 (0.05%), diff: 100,304 (6.15%) Overall (+2,640,345 bytes)
MinOpts (-15 bytes)
FullOpts (+2,640,360 bytes)
Details here Throughput diffsThroughput diffs for linux/arm ran on windows/x86Overall (-0.20% to +2.58%)
MinOpts (-0.22% to +7.90%)
FullOpts (-0.20% to +2.59%)
Throughput diffs for windows/x86 ran on windows/x86Overall (+0.93% to +2.79%)
MinOpts (+0.33% to +8.98%)
FullOpts (+0.96% to +2.79%)
Details here Throughput diffs for linux/arm64 ran on linux/x64Overall (+0.12% to +2.83%)
MinOpts (-0.11% to +5.70%)
FullOpts (+0.13% to +2.83%)
Throughput diffs for linux/x64 ran on linux/x64Overall (+0.19% to +2.74%)
MinOpts (+0.45% to +9.30%)
FullOpts (+0.16% to +2.97%)
Details here Throughput diffs for linux/arm64 ran on windows/x64Overall (+0.23% to +0.68%)
MinOpts (+0.40% to +6.79%)
FullOpts (+0.20% to +0.68%)
Throughput diffs for linux/x64 ran on windows/x64Overall (+0.23% to +0.77%)
MinOpts (+0.48% to +7.46%)
FullOpts (+0.20% to +0.77%)
Throughput diffs for osx/arm64 ran on windows/x64Overall (+0.32% to +0.58%)
MinOpts (+0.35% to +6.74%)
FullOpts (+0.24% to +0.59%)
Throughput diffs for windows/arm64 ran on windows/x64Overall (+0.25% to +0.69%)
MinOpts (+0.47% to +6.80%)
FullOpts (+0.21% to +0.69%)
Throughput diffs for windows/x64 ran on windows/x64Overall (+0.32% to +0.76%)
MinOpts (+0.46% to +7.80%)
FullOpts (+0.24% to +0.76%)
Details here Throughput diffs for linux/arm ran on linux/x86Overall (+0.21% to +0.57%)
MinOpts (+0.28% to +6.67%)
FullOpts (+0.21% to +0.54%)
Details here |
Diff results for #98054Assembly diffsAssembly diffs for linux/arm64 ran on windows/x64Diffs are based on 1,613,384 contexts (368,644 MinOpts, 1,244,740 FullOpts). MISSED contexts: 316 (0.02%) Overall (-16 bytes)
FullOpts (-16 bytes)
Assembly diffs for osx/arm64 ran on windows/x64Diffs are based on 1,736,294 contexts (561,303 MinOpts, 1,174,991 FullOpts). MISSED contexts: 227 (0.01%) Overall (-16 bytes)
FullOpts (-16 bytes)
Assembly diffs for windows/arm64 ran on windows/x64Diffs are based on 1,480,514 contexts (263,527 MinOpts, 1,216,987 FullOpts). MISSED contexts: 247 (0.02%) Overall (-16 bytes)
FullOpts (-16 bytes)
Assembly diffs for windows/x64 ran on windows/x64Diffs are based on 2,002,515 contexts (587,594 MinOpts, 1,414,921 FullOpts). MISSED contexts: 373 (0.02%) Overall (-7 bytes)
FullOpts (-7 bytes)
Details here Throughput diffsThroughput diffs for windows/x86 ran on windows/x86Overall (+0.35% to +0.67%)
MinOpts (+0.44% to +9.00%)
FullOpts (+0.34% to +0.66%)
Details here Throughput diffs for linux/arm64 ran on linux/x64Overall (+0.22% to +0.68%)
MinOpts (+0.43% to +6.76%)
FullOpts (+0.18% to +0.68%)
Throughput diffs for linux/x64 ran on linux/x64Overall (+0.21% to +0.77%)
MinOpts (+0.51% to +7.42%)
FullOpts (+0.18% to +0.77%)
Details here |
Extracting a piece of dotnet#87045 that I had to revert in that PR. Native linkers don't like when LSDA is in a COMDAT so fold these in the object writer instead. Seems to save about 1.2% in the Stage1 app. Obviously Unix only.
The split between `Delegate` and `MulticastDelegate` is a wart that likely has some history behind it. Types that inherit from `Delegate` directly would not be considered delegates by the runtime. They need to inherit `MulticastDelegate. I can't find a reason why we'd need some useless base implementation of these methods that immediately gets overriden in `MulticastDelegate`. This deletes the useless base implementation and moves the useful implementation from `MulticastDelegate` to `Delegate`. This along with dotnet#97951 saves ~40 bytes per delegate `MethodTable` because the virtual methods can now be devirtualized or placed into the sealed vtable. We might be able to do even more since technically sealed virtuals could be reshuffled after the codegen phase and slots eliminated then.
* STJ: Skip usage of indexer for KeyedCollection * Add deserialization test --------- Co-authored-by: Eirik Tsarpalis <eirik.tsarpalis@gmail.com>
Co-authored-by: Marek Fišera <mara@neptuo.com>
* [Mono] [Arm64] Add SIMD support for vector ctors * fix ctors for mini arm64
* [wasm][mt] Improve blocking wait detection and tests Add check to the `SemaphoreSlim.Wait`. The wait will eventually continue to the `Monitor.Wait`, it can spin wait first though and return, so catch it earlier to avoid non-deterministic behavior. Improve the blocking wait detection and extract it to the new method. Also introduce helper method to force a blocking wait in places we want it on the JS interop threads. * Feedback
…ence-packages build 20240207.1 (dotnet#98090) Microsoft.SourceBuild.Intermediate.source-build-reference-packages From Version 9.0.0-alpha.1.24101.2 -> To Version 9.0.0-alpha.1.24107.1 Co-authored-by: dotnet-maestro[bot] <dotnet-maestro[bot]@users.noreply.github.com>
* Update dependencies from https://github.com/dotnet/arcade build 20240206.2 Microsoft.SourceBuild.Intermediate.arcade , Microsoft.DotNet.Arcade.Sdk , Microsoft.DotNet.Build.Tasks.Archives , Microsoft.DotNet.Build.Tasks.Feed , Microsoft.DotNet.Build.Tasks.Installers , Microsoft.DotNet.Build.Tasks.Packaging , Microsoft.DotNet.Build.Tasks.TargetFramework , Microsoft.DotNet.Build.Tasks.Templating , Microsoft.DotNet.Build.Tasks.Workloads , Microsoft.DotNet.CodeAnalysis , Microsoft.DotNet.GenAPI , Microsoft.DotNet.GenFacades , Microsoft.DotNet.Helix.Sdk , Microsoft.DotNet.PackageTesting , Microsoft.DotNet.RemoteExecutor , Microsoft.DotNet.SharedFramework.Sdk , Microsoft.DotNet.VersionTools.Tasks , Microsoft.DotNet.XliffTasks , Microsoft.DotNet.XUnitAssert , Microsoft.DotNet.XUnitConsoleRunner , Microsoft.DotNet.XUnitExtensions From Version 9.0.0-beta.24081.4 -> To Version 9.0.0-beta.24106.2 * Update dependencies from https://github.com/dotnet/arcade build 20240206.2 Microsoft.SourceBuild.Intermediate.arcade , Microsoft.DotNet.Arcade.Sdk , Microsoft.DotNet.Build.Tasks.Archives , Microsoft.DotNet.Build.Tasks.Feed , Microsoft.DotNet.Build.Tasks.Installers , Microsoft.DotNet.Build.Tasks.Packaging , Microsoft.DotNet.Build.Tasks.TargetFramework , Microsoft.DotNet.Build.Tasks.Templating , Microsoft.DotNet.Build.Tasks.Workloads , Microsoft.DotNet.CodeAnalysis , Microsoft.DotNet.GenAPI , Microsoft.DotNet.GenFacades , Microsoft.DotNet.Helix.Sdk , Microsoft.DotNet.PackageTesting , Microsoft.DotNet.RemoteExecutor , Microsoft.DotNet.SharedFramework.Sdk , Microsoft.DotNet.VersionTools.Tasks , Microsoft.DotNet.XliffTasks , Microsoft.DotNet.XUnitAssert , Microsoft.DotNet.XUnitConsoleRunner , Microsoft.DotNet.XUnitExtensions From Version 9.0.0-beta.24081.4 -> To Version 9.0.0-beta.24106.2 * Update dependencies from https://github.com/dotnet/arcade build 20240206.2 Microsoft.SourceBuild.Intermediate.arcade , Microsoft.DotNet.Arcade.Sdk , Microsoft.DotNet.Build.Tasks.Archives , Microsoft.DotNet.Build.Tasks.Feed , Microsoft.DotNet.Build.Tasks.Installers , Microsoft.DotNet.Build.Tasks.Packaging , Microsoft.DotNet.Build.Tasks.TargetFramework , Microsoft.DotNet.Build.Tasks.Templating , Microsoft.DotNet.Build.Tasks.Workloads , Microsoft.DotNet.CodeAnalysis , Microsoft.DotNet.GenAPI , Microsoft.DotNet.GenFacades , Microsoft.DotNet.Helix.Sdk , Microsoft.DotNet.PackageTesting , Microsoft.DotNet.RemoteExecutor , Microsoft.DotNet.SharedFramework.Sdk , Microsoft.DotNet.VersionTools.Tasks , Microsoft.DotNet.XliffTasks , Microsoft.DotNet.XUnitAssert , Microsoft.DotNet.XUnitConsoleRunner , Microsoft.DotNet.XUnitExtensions From Version 9.0.0-beta.24081.4 -> To Version 9.0.0-beta.24106.2 * Update dependencies from https://github.com/dotnet/arcade build 20240206.2 Microsoft.SourceBuild.Intermediate.arcade , Microsoft.DotNet.Arcade.Sdk , Microsoft.DotNet.Build.Tasks.Archives , Microsoft.DotNet.Build.Tasks.Feed , Microsoft.DotNet.Build.Tasks.Installers , Microsoft.DotNet.Build.Tasks.Packaging , Microsoft.DotNet.Build.Tasks.TargetFramework , Microsoft.DotNet.Build.Tasks.Templating , Microsoft.DotNet.Build.Tasks.Workloads , Microsoft.DotNet.CodeAnalysis , Microsoft.DotNet.GenAPI , Microsoft.DotNet.GenFacades , Microsoft.DotNet.Helix.Sdk , Microsoft.DotNet.PackageTesting , Microsoft.DotNet.RemoteExecutor , Microsoft.DotNet.SharedFramework.Sdk , Microsoft.DotNet.VersionTools.Tasks , Microsoft.DotNet.XliffTasks , Microsoft.DotNet.XUnitAssert , Microsoft.DotNet.XUnitConsoleRunner , Microsoft.DotNet.XUnitExtensions From Version 9.0.0-beta.24081.4 -> To Version 9.0.0-beta.24106.2 --------- Co-authored-by: dotnet-maestro[bot] <dotnet-maestro[bot]@users.noreply.github.com>
* Add encoding for ARM64 IF_SVE_CJ_2A instruction group * fix merge conflicts * Fix conflicts --------- Co-authored-by: Kunal Pathak <Kunal.Pathak@microsoft.com>
* Remove StackFrameIterator.ShouldSkipRegularGcReporting * Adjust windows-x64 offsets
…read.browser-wasm is not found` (dotnet#98083) * Invert the logic: prepare MT nuget for ST and ST nuget for MT. * We have only 2 runtimes now. * Added clarification
…net#97713) * Take into account the local involved in GT_SWITCH for resolution * consolidate the logic, remove ifdef * Add handling for op2
…tnet#98084) link.exe hardcodes a list of [magic symbol names](https://learn.microsoft.com/en-us/cpp/error-messages/tool-errors/linker-tools-warning-lnk4104?view=msvc-170) that it thinks should not be generated into import library and emits warnings if they are. We don't have a mechanism to set visibility of `UnmanagedCallersOnly` methods and I don't think we'd want to make a public API for that. But this means that users get a warning that is not particularly actionable. We could either replicate this magic list of symbol names into compiler and generate these as `PRIVATE` (but then how do people opt out if they don't actually want this as `PRIVATE`?), or we can just suppress the warning. This PR suppresses the warning.
* Improved uninitialized object creation performance * Remove duplicate GenericCache getter call, revert to direct throw Co-authored-by: Jan Kotas <jkotas@microsoft.com> --------- Co-authored-by: Jan Kotas <jkotas@microsoft.com>
… runtime (dotnet#98094) Only trigger it for wasm-specific changes in PRs, not libraries/illink.
…orrectly (dotnet#98125) * Ensure that the various `Max*Number` and `Min*Number` APIs optimize correctly * Don't mark the test methods as AO * Add a missing using statement to the test
) * Update TensorPrimitives to vectorize for size(T) is 1 or 2 It was previously only 4 or 8. This does the work previously skipped to enable the "less than largest available vector" path to handle these smaller sizes as well. * Remove some unnecessary aggregations from small paths * Fix copy/paste comments
…Y_3B (dotnet#98136) * Implement IF_SVE_GU_3{A,B,C} * Implement IF_SVE_GX_3{A,B,C} * Implement IF_SVE_GY_3B * Implement IF_SVE_FF_3{A,B,C}
I think you can do this more cheaply by checking for the So something like: void ProfileSynthesis::Run(ProfileSynthesisOption option)
{
if (option == ProfileSynthesisOption::AssignLikelihoodsOnly)
{
AssignLikelihoods();
return;
}
m_dfsTree = m_comp->fgComputeDfs();
m_loops = FlowGraphNaturalLoops::Find(m_dfsTree);
// Retain or compute edge likelihood information
//
switch (option)
{
case ProfileSynthesisOption::AssignLikelihoods:
AssignLikelihoods();
break;
|
I tried that in the last commit, albeit somewhat sloppily: I moved the DFS/loop logic to below the switch statement, and commented out the loop-aware heuristics in By the way, do you know if there's a way to view which contexts in a collection had the largest TP diffs, kind of like what we can do with asmdiffs? I wonder if the largest TP regressions in MinOpts are driven by just a few outliers, considering the relatively low impact in other collections. |
Likely the cost is in all those calls to Unfortunately, our edge info representation is currently biased in favor of edge targets and not edge sources. The ultimate fix for this is something like
|
I see... I'm tempted to try out your second option. For that, I'm guessing we'd need to add a new member to In the meantime, to avoid the |
Using |
…/runtime into profile-synthesis
Thinking about this some more, if you want to drive it from the target side, then I suppose the simple equally-likely projection is better than nothing. So say in |
Sorry for the pings -- my branch got messed up locally. Gonna close this in lieu of a new PR. |
Diff results for #98054Assembly diffsAssembly diffs for linux/arm64 ran on linux/x64Diffs are based on 1,610,272 contexts (368,644 MinOpts, 1,241,628 FullOpts). MISSED contexts: 3,428 (0.21%) Overall (-16 bytes)
FullOpts (-16 bytes)
Assembly diffs for linux/x64 ran on linux/x64Diffs are based on 1,620,764 contexts (360,162 MinOpts, 1,260,602 FullOpts). MISSED contexts: 3,086 (0.19%) Overall (-15 bytes)
FullOpts (-15 bytes)
Assembly diffs for osx/arm64 ran on linux/x64Diffs are based on 1,733,061 contexts (561,303 MinOpts, 1,171,758 FullOpts). MISSED contexts: 3,460 (0.20%) Overall (-16 bytes)
FullOpts (-16 bytes)
Assembly diffs for windows/arm64 ran on linux/x64Diffs are based on 1,477,297 contexts (263,527 MinOpts, 1,213,770 FullOpts). MISSED contexts: 3,464 (0.23%) Overall (-16 bytes)
FullOpts (-16 bytes)
Details here Throughput diffsThroughput diffs for linux/arm64 ran on windows/x64Overall (+0.02% to +0.04%)
MinOpts (+0.02% to +0.08%)
FullOpts (+0.02% to +0.04%)
Throughput diffs for linux/x64 ran on windows/x64Overall (+0.02% to +0.04%)
MinOpts (+0.02% to +0.11%)
FullOpts (+0.02% to +0.04%)
Throughput diffs for osx/arm64 ran on windows/x64Overall (+0.02% to +0.04%)
MinOpts (+0.02% to +0.08%)
FullOpts (+0.02% to +0.04%)
Throughput diffs for windows/arm64 ran on windows/x64Overall (+0.02% to +0.04%)
MinOpts (+0.02% to +0.09%)
FullOpts (+0.02% to +0.04%)
Throughput diffs for windows/x64 ran on windows/x64Overall (+0.02% to +0.04%)
MinOpts (+0.02% to +0.11%)
FullOpts (+0.02% to +0.04%)
Details here Throughput diffs for windows/x86 ran on linux/x86Overall (+0.02% to +0.03%)
MinOpts (+0.01% to +0.11%)
FullOpts (+0.02% to +0.03%)
Details here |
Diff results for #98054Assembly diffsAssembly diffs for windows/x64 ran on windows/x64Diffs are based on 1,999,231 contexts (587,594 MinOpts, 1,411,637 FullOpts). MISSED contexts: 3,657 (0.18%) Overall (-2 bytes)
FullOpts (-2 bytes)
Details here Assembly diffs for linux/arm ran on windows/x86Diffs are based on 1,449,330 contexts (345,734 MinOpts, 1,103,596 FullOpts). MISSED contexts: 55,656 (3.70%) Overall (+0 bytes)
FullOpts (+0 bytes)
Assembly diffs for windows/x86 ran on windows/x86Diffs are based on 1,618,717 contexts (327,626 MinOpts, 1,291,091 FullOpts). MISSED contexts: base: 11,019 (0.68%), diff: 11,022 (0.68%) Overall (+38 bytes)
FullOpts (+38 bytes)
Details here Throughput diffsThroughput diffs for linux/arm ran on windows/x86Overall (+0.02% to +0.03%)
MinOpts (+0.02% to +0.07%)
FullOpts (+0.02% to +0.03%)
Details here Throughput diffs for linux/arm64 ran on linux/x64Overall (+0.02% to +0.04%)
MinOpts (+0.02% to +0.09%)
FullOpts (+0.02% to +0.04%)
Throughput diffs for linux/x64 ran on linux/x64Overall (+0.03% to +0.05%)
MinOpts (+0.02% to +0.12%)
FullOpts (+0.03% to +0.05%)
Details here |
Currently this requires a somewhat manual process (#85755 is about automatically including examples in the generated report). To do it manually you can pass |
Thanks for this! |
Part of #93020. Per conversation in #98054, I'm going to try to replace BasicBlock's block successor pointers (bbTarget, bbFalseTarget, etc) with FlowEdge pointers to simplify access to successor edges. To do this, each edge is going to need access to its destination block, so that access to successor blocks is still simple. As a first step, add a destination block member to FlowEdge.
Part of #93020. Currently, if profile data isn't available or reliable, the JIT won't assign edge likelihoods. We have the ability to come up with reasonable edge likelihoods using a block's number of successors in profile synthesis, so do so when incorporating profile data.
(This change had a lot of churn locally, and I think we can do this heuristic-driven likelihoods fixup more cheaply, i.e. skip the DFS computation if we can tolerate more naive likelihood guesses, don't compute block weights, etc.).