Description
Overview
There is now fairly substantial literature on improving compiler optimization by leveraging machine learning (ML). A comprehensive survey compiled by Zheng Wang can be found here: https://github.com/zwang4/awesome-machine-learning-in-compilers.
Here's a closely relevant paper and github repo: MLGO: a Machine Learning Guided Compiler
Optimizations Framework. Like our efforts below it leverages a Policy Gradient algorithm (reinforcement Learning).
Several years ago we attempted to leverage ML to create better inline heuristics. That experiment was largely a failure, at least as far as using ML to predict if an inline would improve performance. But one of the key speculations from that work was that lack of PGO was to blame. Now that we have PGO, it is a good time to revisit this area to see if it PGO was indeed the key missing ingredient.
Proposed Work
During the .NET 9 product cycle, we plan to investigate applying ML techniques to heuristics used by the JIT. The items below represent the initial steps of this investigation. This list will change and grow as the work progresses.
We also want to tackle a relatively simple problem, at least initially. Thus our initial effort will be to try and refine and improve the heuristic the JIT uses for Common Subexpression Elimination (aka CSE):
- Study the current CSE heuristic and try and understand what it is modelling. Identify apparent shortcomings and limitations.
- See initial notes below.
- Develop tooling for varying which CSEs the JIT can perform, either in concert with the current heuristic or independent of it. Measure the impact variation of this on key JIT metrics (performance, code size, jit time, etc).
- See how well PerfScores can be used as a proxy for measuring actual performance. Ideally, we can leverage PerfScores to avoid needing to run benchmarks repeatedly. But if not, perhaps we can still rely on PerfScores to limit or prioritize the runs needed.
- See notes below for some initial data.
- Try a number of different ML techniques for improving CSE heuristics, both "small step" (evaluating one CSE at a time) and "big step" (evaluate an entire set of CSEs). Identify the key predictive observations. Try both interpretable and black box models.
- I have been working on a Policy Gradient approach. The implementation is split with part of it in the JIT and the rest in a driver program. Documentation is lacking, so ping me if you want to try this at home.
- JIT part was introduced in JIT: initial support for reinforcement learning of CSE heuristic #96880
- Driver part in Initial version of a tool for ML on CSEs jitutils#389
- More work on stopping: JIT: stopping preference for ML CSE heuristic #98063
- More work on the driver: Fix RLCSE for new metrics format jitutils#391
- More features JIT: More CSE heuristics adjustments #98257
- More work on the driver: RLCSE: fix MCMC and GatherFeatures, overwrite dumps jitutils#395
- Streaming SPMI mode JIT: Streaming mode for SPMI #98440
- And client updates to use that mode: RLCSE: streaming SPMI mode jitutils#397
- Enable running in release with fixed parameter set: JIT: refactor CSE to allow running greedy ML heuristic in release #98729
- actually try enabling (draft): JIT: temporarily enable RLCSEGreedy to see how it fares in CI #98776
- I have been working on a Policy Gradient approach. The implementation is split with part of it in the JIT and the rest in a driver program. Documentation is lacking, so ping me if you want to try this at home.
- See how feasible it is to have a single architecture-agnostic heuristic (perhaps parameterized like the current heuristic with register info) or whether different heuristics are warranted for say arm64 and x64.
- Develop a common ML "infrastructure" that we can apply to other similar heuristics in the JIT.
- Set up a CI job to run the Policy Gradient
- JIT: test the new cse policy in jit-experimental #98777 (running the policy, not the training)
- Set up a Perf lab experiment to run the current "best" greedy policy on benchmarks. Tis should be running now, will check on the data in a few days
- Enable RLCSE experiment in the perf lab #98779
- Add setup for RLCSE experiment performance#3975
Also worth noting is Lee Culver's work to adapt the JIT CSE problem into a more standard gymnasium setting: JIT CSE Optimization - Add a gymnasium environment for reinforcement learning #101856
Update May 2024.
Given the work above we have been able to produce heuristics that can improve the aggregate perf score for methods via CSE, with about a 0.4% geomean improvement across all methods with CSE candidates. So far those results haven't translated into widespread improvements in our perf lab data. Exactly why is unclear ... data below would suggest one or more of the following:
- poor correlation of perf score with actual perf
- few benchmarks whose scores critically depend on CSEs, or
- those benchmarks whose perf does critically depend on CSEs are all easy/obvious cases
- difficulty extracting small signal from noise (still lots of noisy benchmark results)
The training evaluations show that the best ML heuristic is obtaining about half of what could be gotten from an optimal heuristic. Some recent results:
Best/base: 0.9913 [optimizing for score]
vs Base 0.9957 Better 440 Same 1476 Worse 84
vs Best 1.0044 Better 0 Same 1607 Worse 393
Params 0.6093, 0.7750,-0.5701,-0.6827, 0.5060,-0.0514,-1.3563, 0.4515,-2.0999, 0.0000,-1.0623,-1.3723, 0.0000,-0.6143,-0.8286,-0.2956,-1.1433, 0.3418, 1.3964,-0.0043,-0.4237, 0.6097,-1.9773,-0.3684, 0.7246
Collecting greedy policy data via SPMI... done (27583 ms)
Greedy/Base (score): 34965 methods, 8375 better, 24792 same, 1797 worse, 0.9960 geomean
Best: 76679 @ 0.7731
Worst: 82000 @ 1.4512
Greedy/Base (size): 34965 methods, 4628 better, 24694 same, 5642 worse, 1.0005 geomean
Best: 15489 @ 0.7000
Worst: 82000 @ 1.4205
We don't have any more active work planned on machine learning of heuristics for .NET 9. However, we expect to revisit this area as .NET 9 work winds down, so stay tuned for further development in the coming months.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status