Skip to content

[opt] load promotion in mandatory combine #30463

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

zoecarver
Copy link
Contributor

This patch adds support for load promotion in mandatory combine. This is from #28536 and will be needed (or similar) to support all the optimizations done by closure cleanup in mandatory inlining.

For example:

store x to as
y = load as
use y

Turns into (minus copies):

store x to as
use x

I plan to also add copy/destroy and store elimination in future patches which should help reduce the dead noise left behind.

This is one of the biggest single patches from #28536 the rest will be smaller (in size and scope). A lot of the logic here could also apply to mem2reg and #30308.

@@ -547,6 +547,8 @@ bool tryOptimizeApplyOfPartialApply(
PartialApplyInst *pai, SILBuilderContext &builderCtxt,
InstModCallbacks callbacks = InstModCallbacks());

bool dominatesAllUses(SILInstruction *a, SILValue b);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implemented in #30464.

@zoecarver
Copy link
Contributor Author

@swift-ci please benchmark.

@swift-ci
Copy link
Contributor

Performance: -O

Regression OLD NEW DELTA RATIO
FlattenListLoop 2899 3307 +14.1% 0.88x (?)
FlattenListFlatMap 5656 6234 +10.2% 0.91x (?)
PrefixArrayLazy 13 14 +7.7% 0.93x (?)

Code size: -O

Performance: -Osize

Code size: -Osize

Performance: -Onone

Code size: -swiftlibs

How to read the data The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview
  Model Name: Mac mini
  Model Identifier: Macmini8,1
  Processor Name: Intel Core i7
  Processor Speed: 3.2 GHz
  Number of Processors: 1
  Total Number of Cores: 6
  L2 Cache (per Core): 256 KB
  L3 Cache: 12 MB
  Memory: 64 GB

@gottesmm
Copy link
Contributor

I don't think it makes sense to do this at -Onone. It would make more sense to do a small bit of analysis to recognize the specific case. We need to be really careful at -Onone with compile time. @atrick your thoughts?

@zoecarver
Copy link
Contributor Author

That's a good point. Two alternatives: a) only scan single blocks (which would greatly reduce the complexity here) and b) only do this for parital_applys at the same time as the other optimizations. If we want to support everything from closure cleanup we would have to do (at least) the second one.

@zoecarver
Copy link
Contributor Author

@swift-ci please smoke test compiler performance.

@zoecarver
Copy link
Contributor Author

@swift-ci please smoke test compiler performance.

@zoecarver
Copy link
Contributor Author

@swift-ci please smoke test compiler performance

@swift-ci
Copy link
Contributor

Summary for master smoketest

No regressions above thresholds

Debug

debug brief

Regressed (0)
name old new delta delta_pct
Improved (0)
name old new delta delta_pct
Unchanged (delta < 1.0% or delta < 100.0ms) (3)
name old new delta delta_pct
Frontend.NumInstructionsExecuted 1,019,638,163,967 1,020,095,119,703 456,955,736 0.04%
LLVM.NumLLVMBytesOutput 54,130,888 54,130,896 8 0.0%
time.swift-driver.wall 80.4s 81.1s 695.8ms 0.87%

debug detailed

Regressed (0)
name old new delta delta_pct
Improved (0)
name old new delta delta_pct
Unchanged (delta < 1.0% or delta < 100.0ms) (17)
name old new delta delta_pct
AST.NumLoadedModules 5,721 5,721 0 0.0%
AST.NumTotalClangImportedEntities 54,209 53,733 -476 -0.88%
IRModule.NumIRBasicBlocks 350,303 350,303 0 0.0%
IRModule.NumIRFunctions 99,881 99,881 0 0.0%
IRModule.NumIRGlobals 111,354 111,354 0 0.0%
IRModule.NumIRInsts 2,590,079 2,590,023 -56 -0.0%
IRModule.NumIRValueSymbols 192,568 192,568 0 0.0%
LLVM.NumLLVMBytesOutput 54,130,888 54,130,896 8 0.0%
SILModule.NumSILGenFunctions 49,539 49,539 0 0.0%
SILModule.NumSILOptFunctions 67,926 67,926 0 0.0%
Sema.NumConformancesDeserialized 190,801 188,982 -1,819 -0.95%
Sema.NumConstraintScopes 1,006,545 1,006,339 -206 -0.02%
Sema.NumDeclsDeserialized 1,504,955 1,489,847 -15,108 -1.0%
Sema.NumGenericSignatureBuilders 26,151 26,076 -75 -0.29%
Sema.NumLazyIterableDeclContexts 177,206 176,465 -741 -0.42%
Sema.NumTypesDeserialized 476,592 474,487 -2,105 -0.44%
Sema.NumTypesValidated 31,198 31,198 0 0.0%

Release

release brief

Regressed (0)
name old new delta delta_pct
Improved (0)
name old new delta delta_pct
Unchanged (delta < 1.0% or delta < 100.0ms) (3)
name old new delta delta_pct
Frontend.NumInstructionsExecuted 1,395,320,547,280 1,396,105,749,659 785,202,379 0.06%
LLVM.NumLLVMBytesOutput 55,688,200 55,688,020 -180 -0.0%
time.swift-driver.wall 141.2s 141.6s 382.9ms 0.27%

release detailed

Regressed (0)
name old new delta delta_pct
Improved (0)
name old new delta delta_pct
Unchanged (delta < 1.0% or delta < 100.0ms) (17)
name old new delta delta_pct
AST.NumLoadedModules 552 552 0 0.0%
AST.NumTotalClangImportedEntities 14,604 14,604 0 0.0%
IRModule.NumIRBasicBlocks 182,221 182,221 0 0.0%
IRModule.NumIRFunctions 76,522 76,522 0 0.0%
IRModule.NumIRGlobals 89,717 89,717 0 0.0%
IRModule.NumIRInsts 1,465,156 1,465,156 0 0.0%
IRModule.NumIRValueSymbols 155,325 155,325 0 0.0%
LLVM.NumLLVMBytesOutput 55,688,200 55,688,020 -180 -0.0%
SILModule.NumSILGenFunctions 29,229 29,229 0 0.0%
SILModule.NumSILOptFunctions 24,856 24,856 0 0.0%
Sema.NumConformancesDeserialized 78,928 78,928 0 0.0%
Sema.NumConstraintScopes 989,933 989,933 0 0.0%
Sema.NumDeclsDeserialized 198,327 198,327 0 0.0%
Sema.NumGenericSignatureBuilders 5,235 5,235 0 0.0%
Sema.NumLazyIterableDeclContexts 24,195 24,195 0 0.0%
Sema.NumTypesDeserialized 107,360 107,360 0 0.0%
Sema.NumTypesValidated 18,705 18,705 0 0.0%

@zoecarver
Copy link
Contributor Author

Some todos:

  • use dominance analysis instead of dominance info.
  • move into another pass, not mandatory combine.

@shahmishal
Copy link
Member

Please update the base branch to main by Oct 5th otherwise the pull request will be closed automatically.

  • How to change the base branch: (Link)
  • More detail about the branch update: (Link)

@zoecarver zoecarver closed this Oct 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants