Skip to content

[semantic-arc-opts] Eliminate all copy_value from guaranteed argument… #19690

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

gottesmm
Copy link
Contributor

@gottesmm gottesmm commented Oct 3, 2018

…s that have all uses that can accept values with guaranteed ownership.

This takes advantage of my restricting Semantic ARC Opts to run only on the
stdlib since we know it passes the ownership verifier. Using that I
reimplemented this optimization in a more robust, elegant, general
way. Specifically, we now will eliminate any SILGen copy_value on a guaranteed
argument all of whose uses are either destroy_values or instructions that can
accept guaranteed parameters. To be clear: This means that the value must be
consumed in the same function only be destroy_value.

Since we know that the lifetime of the guaranteed parameter will be larger than
any owned value created/destroyed in the body, we do not need to check that the
copy_value's destroy_value set is joint post-dominated by the set of end_borrows.

That will have to be added to turn this into a general optimization.


I wrote this as a prototype over the weekend.

rdar://problem/43569988

@gottesmm
Copy link
Contributor Author

gottesmm commented Oct 3, 2018

@swift-ci smoke benchmark

@gottesmm gottesmm requested review from eeckstein and atrick October 3, 2018 18:29
@gottesmm
Copy link
Contributor Author

gottesmm commented Oct 3, 2018

@swift-ci test

@gottesmm
Copy link
Contributor Author

gottesmm commented Oct 3, 2018

@swift-ci test compiler performance

@swift-ci
Copy link
Contributor

swift-ci commented Oct 3, 2018

Build comment file:

No performance and code size changes

How to read the data The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false alarms. Unexpected regressions which are marked with '(?)' are probably noise. If you see regressions which you cannot explain you can try to run the benchmarks again. If regressions still show up, please consult with the performance team (@eeckstein).

Hardware Overview
  Model Name: Mac Pro
  Model Identifier: MacPro6,1
  Processor Name: 12-Core Intel Xeon E5
  Processor Speed: 2.7 GHz
  Number of Processors: 1
  Total Number of Cores: 12
  L2 Cache (per Core): 256 KB
  L3 Cache: 30 MB
  Memory: 64 GB

@gottesmm gottesmm force-pushed the pr-20fe767974f279f36f9ca7bfbec1133a41f07747 branch from edd6ccc to d2aef5c Compare October 3, 2018 20:12
@gottesmm
Copy link
Contributor Author

gottesmm commented Oct 3, 2018

Got curious and looked at some of the output. Made a small change that ups the number of rr that we remove from the stdlib from < 200 to > 7000. Lets see if that gives us some boost.

@gottesmm
Copy link
Contributor Author

gottesmm commented Oct 3, 2018

@swift-ci smoke benchmark

1 similar comment
@gottesmm
Copy link
Contributor Author

gottesmm commented Oct 3, 2018

@swift-ci smoke benchmark

@gottesmm
Copy link
Contributor Author

gottesmm commented Oct 3, 2018

(and I already see a bunch of rr traffic we can get rid of from objc code in the stdlib).

@gottesmm
Copy link
Contributor Author

gottesmm commented Oct 3, 2018

@swift-ci smoke benchmark

@swift-ci
Copy link
Contributor

swift-ci commented Oct 3, 2018

Build comment file:

Performance: -O

TEST OLD NEW DELTA RATIO
Improvement
DropLastAnySeqCRangeIterLazy 3388 2306 -31.9% 1.47x
DropLastAnySeqCRangeIter 3384 2326 -31.3% 1.45x
SuffixSequence 3689 2569 -30.4% 1.44x
SuffixSequenceLazy 3688 2569 -30.3% 1.44x
SuffixAnySeqCRangeIterLazy 3691 2610 -29.3% 1.41x
SuffixAnySeqCRangeIter 3688 2618 -29.0% 1.41x
ReversedBidirectional 14100 11006 -21.9% 1.28x
SuffixAnySequence 5034 3948 -21.6% 1.28x
DropLastAnySequence 5058 3968 -21.6% 1.27x
DropLastAnySequenceLazy 5139 4080 -20.6% 1.26x
SuffixAnySequenceLazy 5111 4062 -20.5% 1.26x
StringWordBuilderReservingCapacity 1132 990 -12.5% 1.14x (?)
PrefixAnyCollectionLazy 65078 57993 -10.9% 1.12x
DropLastAnyCollectionLazy 21404 19277 -9.9% 1.11x
IterateData 1780 1612 -9.4% 1.10x (?)
CStringLongAscii 3608 3291 -8.8% 1.10x
StringBuilder 490 448 -8.6% 1.09x
StringBuilderSmallReservingCapacity 500 458 -8.4% 1.09x
StringAdder 542 504 -7.0% 1.08x

Code size: -O

TEST OLD NEW DELTA RATIO
Regression
StringInterpolation.o 11386 11706 +2.8% 0.97x
DictTest4.o 24908 25308 +1.6% 0.98x

Performance: -Osize

TEST OLD NEW DELTA RATIO
Regression
Array2D 11998 13194 +10.0% 0.91x
Improvement
PrefixWhileAnySeqCRangeIter 9689 6395 -34.0% 1.52x
PrefixWhileAnySequence 11082 7763 -29.9% 1.43x
DropLastAnySeqCRangeIterLazy 3737 2620 -29.9% 1.43x
DropLastAnySeqCRangeIter 3735 2671 -28.5% 1.40x
SuffixAnySeqCRangeIterLazy 4038 2910 -27.9% 1.39x
SuffixAnySeqCRangeIter 4038 2912 -27.9% 1.39x
SuffixSequence 4016 2919 -27.3% 1.38x
SuffixSequenceLazy 4001 2922 -27.0% 1.37x
DropLastAnySequenceLazy 5808 4620 -20.5% 1.26x
SuffixAnySequenceLazy 5702 4551 -20.2% 1.25x
SuffixAnySequence 5608 4493 -19.9% 1.25x
DropLastAnySequence 5627 4520 -19.7% 1.24x
ReversedBidirectional 21723 18379 -15.4% 1.18x
IterateData 1847 1580 -14.5% 1.17x (?)
StringWordBuilderReservingCapacity 1131 989 -12.6% 1.14x
SuffixAnyCollectionLazy 21862 19230 -12.0% 1.14x
DropFirstAnyCollectionLazy 65201 57783 -11.4% 1.13x
DropLastAnyCollectionLazy 21728 19377 -10.8% 1.12x
PrefixAnyCollectionLazy 65092 58319 -10.4% 1.12x
StringBuilderSmallReservingCapacity 500 457 -8.6% 1.09x
StringBuilder 489 447 -8.6% 1.09x
StringEnumRawValueInitialization 786 726 -7.6% 1.08x (?)
CStringLongAscii 3563 3295 -7.5% 1.08x
StringInterpolationSmall 4000 3735 -6.6% 1.07x

Code size: -Osize

TEST OLD NEW DELTA RATIO
Regression
DictionaryKeysContains.o 15003 15499 +3.3% 0.97x
DictTest4Legacy.o 22633 23097 +2.1% 0.98x
StringRemoveDupes.o 9037 9149 +1.2% 0.99x
DictTest4.o 20939 21195 +1.2% 0.99x
Improvement
DropLast.o 25443 25171 -1.1% 1.01x
Suffix.o 25905 25633 -1.0% 1.01x
DropFirst.o 24724 24468 -1.0% 1.01x

Performance: -Onone

TEST OLD NEW DELTA RATIO
Improvement
ClassArrayGetter2 9396 5618 -40.2% 1.67x
ReversedArray2 14538 9116 -37.3% 1.59x
CountAlgoArray 150501 95814 -36.3% 1.57x
SumUsingReduceInto 151098 96658 -36.0% 1.56x
SequenceAlgosArray 746627 484512 -35.1% 1.54x
Array2D 339475 221618 -34.7% 1.53x
StringComparison_ascii 8956 5947 -33.6% 1.51x
SumUsingReduce 155461 103523 -33.4% 1.50x
ReversedDictionary2 15332 10352 -32.5% 1.48x
MapReduceAnyCollection 26343 17861 -32.2% 1.47x
UTF8Decode 29986 20332 -32.2% 1.47x
MapReduce 26438 17942 -32.1% 1.47x
RemoveWhereSwapInts 4804 3280 -31.7% 1.46x
StringComparison_nonBMPSlowestPrenormal 4426 3072 -30.6% 1.44x
RandomShuffleLCG2 35003 24429 -30.2% 1.43x
StringComparison_emoji 2368 1668 -29.6% 1.42x
StringHashing_ascii 210 149 -29.0% 1.41x
LazilyFilteredArrayContains 778497 554768 -28.7% 1.40x
SubstringComparable 1611 1157 -28.2% 1.39x
MapReduceClass 30277 21769 -28.1% 1.39x
StringComparison_latin1 3840 2777 -27.7% 1.38x
RemoveWhereMoveInts 2731 1975 -27.7% 1.38x
RemoveWhereSwapStrings 5523 3998 -27.6% 1.38x
StringComparison_fastPrenormal 4926 3570 -27.5% 1.38x
DropWhileArrayLazy 14931 10871 -27.2% 1.37x
PrefixWhileArray 12022 8773 -27.0% 1.37x
StringBuilderSmallReservingCapacity 5294 3896 -26.4% 1.36x
StringBuilder 5258 3889 -26.0% 1.35x
SubstringEquatable 5973 4419 -26.0% 1.35x
PrefixWhileArrayLazy 13281 9850 -25.8% 1.35x
Chars2 39034 28994 -25.7% 1.35x
RandomShuffleDef2 6562 4916 -25.1% 1.33x
FilterEvenUsingReduceInto 1926 1449 -24.8% 1.33x
RemoveWhereFilterInts 2021 1530 -24.3% 1.32x
PrimsSplit 7762 5900 -24.0% 1.32x
SetIsSubsetInt100 1027 783 -23.8% 1.31x
MapReduceShort 39464 30106 -23.7% 1.31x
Prims 7615 5827 -23.5% 1.31x
ArrayAppendLazyMap 172079 132567 -23.0% 1.30x
ArrayAppendAscii 28124 21752 -22.7% 1.29x
ChainedFilterMap 235415 182119 -22.6% 1.29x
RemoveWhereMoveStrings 3260 2528 -22.5% 1.29x
SetIntersectionInt0 222 174 -21.6% 1.28x
MapReduceLazyCollection 22445 17633 -21.4% 1.27x
QueueConcrete 13825 10882 -21.3% 1.27x
SetIntersect 2223 1750 -21.3% 1.27x
ArrayValueProp2 15011 11860 -21.0% 1.27x
CharIteration_japanese_unicodeScalars 185904 147284 -20.8% 1.26x
SuperChars 80715 64161 -20.5% 1.26x
MapReduceAnyCollectionShort 39028 31084 -20.4% 1.26x
CharIteration_ascii_unicodeScalars 155442 123828 -20.3% 1.26x
CharIteration_chinese_unicodeScalars 117406 93559 -20.3% 1.25x
CharIteration_punctuated_unicodeScalars 34604 27583 -20.3% 1.25x
CharIteration_tweet_unicodeScalars 306885 244650 -20.3% 1.25x
CharIteration_korean_unicodeScalars 150449 120105 -20.2% 1.25x
MapReduceString 1990 1590 -20.1% 1.25x
CharIteration_russian_unicodeScalars 129085 103158 -20.1% 1.25x
RemoveWhereFilterStrings 2651 2124 -19.9% 1.25x
CharIteration_punctuatedJapanese_unicodeScalars 27478 22079 -19.6% 1.24x
LazilyFilteredArrays2 95545 76919 -19.5% 1.24x
WordCountUniqueASCII 7233 5828 -19.4% 1.24x
Dictionary 2179 1757 -19.4% 1.24x
RemoveWhereQuadraticString 2255 1825 -19.1% 1.24x
MapReduceClassShort 41719 33822 -18.9% 1.23x
SubstringEqualString 1924 1563 -18.8% 1.23x
SetIsSubsetInt50 547 449 -17.9% 1.22x
CharIteration_utf16_unicodeScalars 139810 115033 -17.7% 1.22x
ObserverForwarderStruct 3867 3194 -17.4% 1.21x
QueueGeneric 18505 15289 -17.4% 1.21x
RemoveWhereFilterString 1412 1167 -17.4% 1.21x
PrefixWhileSequence 25465 21139 -17.0% 1.20x
PrefixWhileAnySequence 26126 21694 -17.0% 1.20x
SuffixSequenceLazy 21461 17827 -16.9% 1.20x
SuffixAnySequence 21718 18061 -16.8% 1.20x
WordCountUniqueUTF16 10622 8844 -16.7% 1.20x
PopFrontArray 4377 3650 -16.6% 1.20x
SuffixSequence 21397 17860 -16.5% 1.20x
SuffixAnySequenceLazy 21772 18192 -16.4% 1.20x
RemoveWhereQuadraticInts 7739 6475 -16.3% 1.20x
DropLastSequence 27933 23379 -16.3% 1.19x
DropLastAnySequenceLazy 28203 23616 -16.3% 1.19x
DropLastAnySequence 28232 23648 -16.2% 1.19x
DropLastSequenceLazy 27913 23381 -16.2% 1.19x
ArrayPlusEqualFiveElementCollection 237742 200031 -15.9% 1.19x
MapReduceSequence 30746 25988 -15.5% 1.18x
MapReduceLazyCollectionShort 33611 28921 -14.0% 1.16x
MapReduceShortString 256 221 -13.7% 1.16x
SetIsSubsetInt25 275 239 -13.1% 1.15x
PrefixWhileAnySeqCRangeIter 34977 30473 -12.9% 1.15x
RemoveWhereQuadraticStrings 9393 8201 -12.7% 1.15x
DropLastAnySeqCRangeIterLazy 40293 35241 -12.5% 1.14x
CharIteration_chinese_unicodeScalars_Backwards 201681 176546 -12.5% 1.14x
CharIteration_korean_unicodeScalars_Backwards 258977 227708 -12.1% 1.14x
CharIteration_punctuated_unicodeScalars_Backwards 58728 51638 -12.1% 1.14x
CharIteration_japanese_unicodeScalars_Backwards 320330 281687 -12.1% 1.14x
Histogram 6032 5307 -12.0% 1.14x
CharIteration_ascii_unicodeScalars_Backwards 266753 234769 -12.0% 1.14x
ObserverClosure 5939 5245 -11.7% 1.13x
DropLastAnySeqCRangeIter 39756 35229 -11.4% 1.13x
SetIntersectionInt25 406 360 -11.3% 1.13x
StringAdder 746 662 -11.3% 1.13x
MapReduceLazySequence 21484 19079 -11.2% 1.13x
ReversedBidirectional 44904 39956 -11.0% 1.12x
SuffixAnySeqCRangeIterLazy 33479 29814 -10.9% 1.12x
SuffixAnySeqCRangeIter 33267 29646 -10.9% 1.12x
ObjectAllocation 1227 1097 -10.6% 1.12x
StringWordBuilderReservingCapacity 1349 1207 -10.5% 1.12x (?)
ArrayOfGenericPOD2 1180 1067 -9.6% 1.11x (?)
ObserverPartiallyAppliedMethod 7255 6566 -9.5% 1.10x
ArrayOfPOD 859 782 -9.0% 1.10x
ObserverUnappliedMethod 7581 6945 -8.4% 1.09x
StringWordBuilder 1993 1829 -8.2% 1.09x (?)
CStringLongAscii 3667 3375 -8.0% 1.09x
Hanoi 21720 20008 -7.9% 1.09x
PrefixArrayLazy 31419 29131 -7.3% 1.08x

Code size: Swift libraries

TEST OLD NEW DELTA RATIO
Improvement
libswiftAppKit.dylib 86016 81920 -4.8% 1.05x
libswiftNetwork.dylib 172032 167936 -2.4% 1.02x
How to read the data The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false alarms. Unexpected regressions which are marked with '(?)' are probably noise. If you see regressions which you cannot explain you can try to run the benchmarks again. If regressions still show up, please consult with the performance team (@eeckstein).

Hardware Overview
  Model Name: Mac Pro
  Model Identifier: MacPro6,1
  Processor Name: 12-Core Intel Xeon E5
  Processor Speed: 2.7 GHz
  Number of Processors: 1
  Total Number of Cores: 12
  L2 Cache (per Core): 256 KB
  L3 Cache: 30 MB
  Memory: 64 GB

@gottesmm gottesmm changed the title [WIP][semantic-arc-opts] Eliminate all copy_value from guaranteed argument… [semantic-arc-opts] Eliminate all copy_value from guaranteed argument… Oct 3, 2018
@gottesmm
Copy link
Contributor Author

gottesmm commented Oct 3, 2018

I think this is good enough. I am going to merge what I have.

@gottesmm
Copy link
Contributor Author

gottesmm commented Oct 3, 2018

@swift-ci smoke test

@gottesmm
Copy link
Contributor Author

gottesmm commented Oct 3, 2018

@atrick can I get a review?

@gottesmm
Copy link
Contributor Author

gottesmm commented Oct 3, 2018

Or actually, I need to clean it up first.

@gottesmm gottesmm force-pushed the pr-20fe767974f279f36f9ca7bfbec1133a41f07747 branch from d2aef5c to 637d7eb Compare October 3, 2018 22:04
@gottesmm
Copy link
Contributor Author

gottesmm commented Oct 3, 2018

@swift-ci smoke test

5 similar comments
@gottesmm
Copy link
Contributor Author

gottesmm commented Oct 3, 2018

@swift-ci smoke test

@gottesmm
Copy link
Contributor Author

gottesmm commented Oct 3, 2018

@swift-ci smoke test

@gottesmm
Copy link
Contributor Author

gottesmm commented Oct 3, 2018

@swift-ci smoke test

@gottesmm
Copy link
Contributor Author

gottesmm commented Oct 3, 2018

@swift-ci smoke test

@gottesmm
Copy link
Contributor Author

gottesmm commented Oct 3, 2018

@swift-ci smoke test

@atrick
Copy link
Contributor

atrick commented Oct 3, 2018

Thanks. I haven't reviewed it yet but it's more than promising.

@swift-ci
Copy link
Contributor

swift-ci commented Oct 4, 2018

Build comment file:

Summary for master full

Unexpected test results, excluded stats for RxSwift, Fluent, Wordy, ReactiveSwift

No regressions above thresholds

Debug-batch

debug-batch brief

Regressed (0)
name old new delta delta_pct
Improved (0)
name old new delta delta_pct
Unchanged (delta < 1.0% or delta < 100.0ms) (3)
name old new delta delta_pct
Frontend.NumInstructionsExecuted 14,355,756,998,038 14,358,422,984,656 2,665,986,618 0.02%
LLVM.NumLLVMBytesOutput 669,690,348 669,689,130 -1,218 -0.0%
time.swift-driver.wall 1658.3s 1647.9s -10.4s -0.63%

debug-batch detailed

Regressed (0)
name old new delta delta_pct
Improved (2)
name old new delta delta_pct
Driver.NumDriverPipePolls 211,519 206,362 -5,157 -2.44% ✅
Driver.NumDriverPipeReads 234,002 227,839 -6,163 -2.63% ✅
Unchanged (delta < 1.0% or delta < 100.0ms) (92)
name old new delta delta_pct
AST.NumASTBytesAllocated 19,657,268,424 19,653,332,012 -3,936,412 -0.02%
AST.NumDecls 46,267 46,267 0 0.0%
AST.NumDependencies 132,810 132,820 10 0.01%
AST.NumImportedExternalDefinitions 918,389 918,389 0 0.0%
AST.NumInfixOperators 19,744 19,744 0 0.0%
AST.NumLinkLibraries 0 0 0 0.0%
AST.NumLoadedModules 150,702 150,702 0 0.0%
AST.NumLocalTypeDecls 111 111 0 0.0%
AST.NumObjCMethods 13,059 13,059 0 0.0%
AST.NumPostfixOperators 14 14 0 0.0%
AST.NumPrecedenceGroups 9,238 9,238 0 0.0%
AST.NumPrefixOperators 61 61 0 0.0%
AST.NumReferencedDynamicNames 101 101 0 0.0%
AST.NumReferencedMemberNames 2,545,860 2,545,860 0 0.0%
AST.NumReferencedTopLevelNames 164,905 164,905 0 0.0%
AST.NumSourceBuffers 202,721 202,722 1 0.0%
AST.NumSourceLines 1,543,248 1,543,248 0 0.0%
AST.NumSourceLinesPerSecond 919,720 915,598 -4,122 -0.45%
AST.NumTotalClangImportedEntities 3,166,106 3,166,106 0 0.0%
AST.NumUsedConformances 145,051 145,051 0 0.0%
Driver.ChildrenMaxRSS 50,764,836,864 50,717,052,928 -47,783,936 -0.09%
Driver.DriverDepCascadingDynamic 0 0 0 0.0%
Driver.DriverDepCascadingExternal 0 0 0 0.0%
Driver.DriverDepCascadingMember 0 0 0 0.0%
Driver.DriverDepCascadingNominal 0 0 0 0.0%
Driver.DriverDepCascadingTopLevel 0 0 0 0.0%
Driver.DriverDepDynamic 0 0 0 0.0%
Driver.DriverDepExternal 0 0 0 0.0%
Driver.DriverDepMember 0 0 0 0.0%
Driver.DriverDepNominal 0 0 0 0.0%
Driver.DriverDepTopLevel 0 0 0 0.0%
Driver.NumDriverJobsRun 9,752 9,752 0 0.0%
Driver.NumDriverJobsSkipped 0 0 0 0.0%
Driver.NumProcessFailures 0 0 0 0.0%
Frontend.MaxMallocUsage 246,340,561,016 246,566,215,920 225,654,904 0.09%
Frontend.NumInstructionsExecuted 14,355,756,998,038 14,358,422,984,656 2,665,986,618 0.02%
Frontend.NumProcessFailures 0 0 0 0.0%
IRModule.NumIRAliases 74,661 74,661 0 0.0%
IRModule.NumIRBasicBlocks 2,506,993 2,506,993 0 0.0%
IRModule.NumIRComdatSymbols 0 0 0 0.0%
IRModule.NumIRFunctions 1,270,131 1,270,131 0 0.0%
IRModule.NumIRGlobals 1,475,477 1,475,477 0 0.0%
IRModule.NumIRIFuncs 0 0 0 0.0%
IRModule.NumIRInsts 27,143,973 27,143,973 0 0.0%
IRModule.NumIRNamedMetaData 47,563 47,563 0 0.0%
IRModule.NumIRValueSymbols 2,448,580 2,448,580 0 0.0%
LLVM.NumLLVMBytesOutput 669,690,348 669,689,130 -1,218 -0.0%
Parse.NumFunctionsParsed 1,514,222 1,514,222 0 0.0%
Parse.NumIterableDeclContextParsed 535,274 535,274 0 0.0%
SILModule.NumSILGenDefaultWitnessTables 0 0 0 0.0%
SILModule.NumSILGenFunctions 1,081,643 1,081,643 0 0.0%
SILModule.NumSILGenGlobalVariables 24,671 24,671 0 0.0%
SILModule.NumSILGenVtables 4,636 4,636 0 0.0%
SILModule.NumSILGenWitnessTables 27,502 27,502 0 0.0%
SILModule.NumSILOptDefaultWitnessTables 0 0 0 0.0%
SILModule.NumSILOptFunctions 921,723 921,768 45 0.0%
SILModule.NumSILOptGlobalVariables 25,136 25,136 0 0.0%
SILModule.NumSILOptVtables 8,850 8,850 0 0.0%
SILModule.NumSILOptWitnessTables 55,795 55,795 0 0.0%
Sema.AccessLevelRequest 1,294,882 1,294,882 0 0.0%
Sema.DefaultAndMaxAccessLevelRequest 31,166 31,166 0 0.0%
Sema.EnumRawTypeRequest 10,424 10,424 0 0.0%
Sema.ExtendedNominalRequest 1,943,500 1,938,415 -5,085 -0.26%
Sema.InheritedDeclsReferencedRequest 64,383,102 64,383,231 129 0.0%
Sema.InheritedTypeRequest 354,694 354,694 0 0.0%
Sema.IsDynamicRequest 1,192,455 1,192,455 0 0.0%
Sema.IsObjCRequest 997,842 997,842 0 0.0%
Sema.NamedLazyMemberLoadFailureCount 17,527 17,527 0 0.0%
Sema.NamedLazyMemberLoadSuccessCount 14,275,384 14,249,346 -26,038 -0.18%
Sema.NominalTypeLookupDirectCount 22,703,591 22,703,606 15 0.0%
Sema.NumConformancesDeserialized 1,983,629 1,976,900 -6,729 -0.34%
Sema.NumConstraintScopes 10,292,376 10,292,376 0 0.0%
Sema.NumConstraintsConsideredForEdgeContraction 19,307,022 19,307,022 0 0.0%
Sema.NumDeclsDeserialized 18,949,082 18,938,023 -11,059 -0.06%
Sema.NumDeclsValidated 1,202,395 1,202,395 0 0.0%
Sema.NumFunctionsTypechecked 793,535 793,535 0 0.0%
Sema.NumGenericSignatureBuilders 659,554 659,560 6 0.0%
Sema.NumLazyGenericEnvironments 4,101,880 4,095,153 -6,727 -0.16%
Sema.NumLazyGenericEnvironmentsLoaded 94,021 94,021 0 0.0%
Sema.NumLazyIterableDeclContexts 3,907,615 3,901,267 -6,348 -0.16%
Sema.NumTypesDeserialized 7,857,270 7,837,400 -19,870 -0.25%
Sema.NumTypesValidated 779,422 779,422 0 0.0%
Sema.NumUnloadedLazyIterableDeclContexts 3,151,288 3,145,639 -5,649 -0.18%
Sema.OverriddenDeclsRequest 1,051,406 1,051,406 0 0.0%
Sema.RequirementRequest 22,731 22,731 0 0.0%
Sema.SelfBoundsFromWhereClauseRequest 63,119 63,119 0 0.0%
Sema.SetterAccessLevelRequest 77,899 77,899 0 0.0%
Sema.SuperclassDeclRequest 53,301,875 53,301,875 0 0.0%
Sema.SuperclassTypeRequest 17,741 17,741 0 0.0%
Sema.TypeDeclsFromWhereClauseRequest 14,456 14,456 0 0.0%
Sema.USRGenerationRequest 234,940 234,940 0 0.0%
Sema.UnderlyingTypeDeclsReferencedRequest 1,791,925 1,791,925 0 0.0%

Release

release brief

Regressed (0)
name old new delta delta_pct
Improved (0)
name old new delta delta_pct
Unchanged (delta < 1.0% or delta < 100.0ms) (3)
name old new delta delta_pct
Frontend.NumInstructionsExecuted 17,830,463,615,792 17,842,067,875,490 11,604,259,698 0.07%
LLVM.NumLLVMBytesOutput 563,142,708 563,167,336 24,628 0.0%
time.swift-driver.wall 3362.0s 3365.0s 3.0s 0.09%

release detailed

Regressed (0)
name old new delta delta_pct
Improved (0)
name old new delta delta_pct
Unchanged (delta < 1.0% or delta < 100.0ms) (23)
name old new delta delta_pct
AST.NumImportedExternalDefinitions 170,056 170,056 0 0.0%
AST.NumLoadedModules 9,987 9,987 0 0.0%
AST.NumTotalClangImportedEntities 574,906 574,906 0 0.0%
AST.NumUsedConformances 148,171 148,171 0 0.0%
IRModule.NumIRBasicBlocks 2,455,238 2,455,238 0 0.0%
IRModule.NumIRFunctions 1,023,605 1,023,609 4 0.0%
IRModule.NumIRGlobals 1,146,311 1,146,311 0 0.0%
IRModule.NumIRInsts 19,797,367 19,797,339 -28 -0.0%
IRModule.NumIRValueSymbols 2,003,566 2,003,570 4 0.0%
LLVM.NumLLVMBytesOutput 563,142,708 563,167,336 24,628 0.0%
SILModule.NumSILGenFunctions 440,181 440,181 0 0.0%
SILModule.NumSILOptFunctions 629,802 633,846 4,044 0.64%
Sema.NumConformancesDeserialized 1,271,565 1,274,266 2,701 0.21%
Sema.NumConstraintScopes 10,040,159 10,040,159 0 0.0%
Sema.NumDeclsDeserialized 3,847,337 3,857,104 9,767 0.25%
Sema.NumDeclsValidated 565,342 565,342 0 0.0%
Sema.NumFunctionsTypechecked 341,792 341,792 0 0.0%
Sema.NumGenericSignatureBuilders 119,759 120,257 498 0.42%
Sema.NumLazyGenericEnvironments 811,128 813,654 2,526 0.31%
Sema.NumLazyGenericEnvironmentsLoaded 15,845 15,845 0 0.0%
Sema.NumLazyIterableDeclContexts 502,394 501,689 -705 -0.14%
Sema.NumTypesDeserialized 2,170,980 2,178,894 7,914 0.36%
Sema.NumTypesValidated 249,599 249,599 0 0.0%

Copy link
Contributor

@atrick atrick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great. It should unlock the benefits of simple copy propagation in ownership SIL.

If the intention here is to stage in this PR before changing the ownership map kind stuff, that's ok, but my review comments are always "as-is"...

bool swift::getUnderlyingBorrowIntroducers(SILValue inputValue,
SmallVectorImpl<SILValue> &out) {
SmallVector<SILValue, 32> worklist;
worklist.emplace_back(inputValue);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[comment] I think it's fine if you always use emplace_back, but some people might say it's silly because you're just calling the same SILValue copy constructor as push_back.

NumEliminatedInsts += 2;
} // end anonymous namespace

bool SemanticARCOptVisitor::visitBeginBorrowInst(BeginBorrowInst *bbi) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[question] I thought we were using all-caps for acronyms: BBI? I don't actually care, but if not, there will be quite a lot of code to rewrite.

SmallVector<EndBorrowInst *, 16> endBorrows;
for (auto *op : bbi->getUses()) {
auto *user = op->getUser();
switch (user->getKind()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit] I don't understand why a switch statement is used here.

default:
// Make sure that this operand can accept our arguments kind.
auto map = op->getOwnershipKindMap();
if (map.canAcceptKind(kind))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[design] I still don't understand what an OwnershipKindMap is for. I think it obscures the meaning of the code. Why not just:

op->canAcceptOwnership(kind)

Also, this check doesn't make sense to me for forwarding instructions, without teaching the optimization to look through all of those. If the forwarding instruction is not taking ownership before, we don't want to implicitly change it's meaning by changing the ownership of its incoming value.

This is what would make sense to me, instead of the map and "can accept" check.

// Is this a point-in-time use that cannot extend the lifetime if it's
// operand value?
if (op->isNonConsuming())
  continue;

return false

/// Stashes a value for an iterator for a scope. This instruction/any
/// instructions after instruction in the block/any instructions in
/// other blocks can all be deleted safely.
struct ForwardInstIteratorGuard {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[style] This is a substantial abstraction that I would only ever use to hack
around old code instead of just fixing those APIs.

I generally think that transformation utilities that delete instructions (or blocks)
should be responsible for returning a valid iterator. This is just a
style comment...

for (auto ii = begin(), nextI = ii, ie = end(); ii != ie; ii = nextI) {
  nextI = std::next(ii)

  // Ususally some large number of conditions...
  if (someBailout)
    continue

  // And any number of possible transformations...
  if (canTransform) {
    nextI = transformInst(ii);
    continue
  }
}

// Ok, this constraint can take something owned as live. Lets
// see if it can also take something that is guaranteed. If it
// can not, then we bail.
if (!map.canAcceptKind(ValueOwnershipKind::Guaranteed)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[design] I really can't mentally map the code that's using these physical maps to concepts that I normally reason about. i.e. If I wanted to know how this would work on a concrete SIL example, I would have to construct a test case by hand and run it through the code. (that's the problem with too much indirection). I think this is what the code is trying to do (eliding some details), but not sure:

switch (operand.ownershipEffect()):
case Consuming:
  if (isa<Destroy>(operand.getUser()))
    continue;
  return false;

case NonConsuming:
  // A point-in-time use can always handle +1 or +0.
  continue;

case Forwarding:
  // This instruction takes ownership of the copy. We don't currently
  // recurse through them.
  return false;
}

If we wanted to constrain some non-consuming users to only handle owned
values, then we could introduce a separate "use constraint" API for
that. But I suspect we don't want to allow that anyway.

@gottesmm
Copy link
Contributor Author

gottesmm commented Oct 4, 2018

Andy and I discussed about this offline. There are a few things here that we need before this can land:

  1. We need to start storing into forwarding instructions the value ownership kind they have when we compute in their constructor so that we do not dynamically change the ownership of forwarding instructions.

  2. I explained to him that in the begin borrow case we do not need to worry about forwarding since the ownership kind map will tell us if the non consuming use is instantaneous (i.e. we can accept owned) or not. From talking with Andy, we can add that as a separate bit and split that from returning whether something is consuming or not.

@atrick
Copy link
Contributor

atrick commented Oct 4, 2018

Sounds good. I think we should have a separate API to identify "borrow projections" (struct_extract, ref_element_addr), although they may be hidden behind other forwarding instructions (casts), so we probably can't just ignore forwarding instructions when we remove borrow scopes.

@gottesmm gottesmm force-pushed the pr-20fe767974f279f36f9ca7bfbec1133a41f07747 branch from 637d7eb to dbeb2d5 Compare November 8, 2018 03:41
@gottesmm
Copy link
Contributor Author

@atrick I landed the forwarding change in #20497. So now we shouldn't compute things dynamically anymore. I am back on this.

…s that have all uses that can accept values with guaranteed ownership.

This takes advantage of my restricting Semantic ARC Opts to run only on the
stdlib since we know it passes the ownership verifier. Using that I
reimplemented this optimization in a more robust, elegant, general
way. Specifically, we now will eliminate any SILGen copy_value on a guaranteed
argument all of whose uses are either destroy_values or instructions that can
accept guaranteed parameters. To be clear: This means that the value must be
consumed in the same function only be destroy_value.

Since we know that the lifetime of the guaranteed parameter will be larger than
any owned value created/destroyed in the body, we do not need to check that the
copy_value's destroy_value set is joint post-dominated by the set of end_borrows.

That will have to be added to turn this into a general optimization.
Specifically, now the optimizer can take the following code:

sil @foo: $@convention(thin) (@guaranteed NativeObjectPair) -> () {
bb0(%0 : @guaranteed $NativeObjectPair):
  %1 = struct_extract %0 : $NativeObjectPair, #NativeObjectPair.obj1
  %2 = copy_value %1 : $Builtin.NativeObject
  %3 = begin_borrow %2 : $Builtin.NativeObject
  %foo = function_ref ...
  apply %foo(%3) : $@convention(thin) (@guaranteed Builtin.NativeObject) -> ()
  end_borrow %3 : $Builtin.NativeObject
  destroy_value %2 : $Builtin.NativeObject
  ...
}

and eliminate all ownership instructions. I.e.:

sil @foo: $@convention(thin) (@guaranteed NativeObjectPair) -> () {
bb0(%0 : @guaranteed $NativeObjectPair):
  %1 = struct_extract %0 : $NativeObjectPair, #NativeObjectPair.obj1
  %foo = function_ref ....
  apply %foo(%1)
  ...
}

Before this the optimizer could only eliminate 200 insts in the stdlib. With
this change, we now can eliminate ~7000.
@gottesmm gottesmm force-pushed the pr-20fe767974f279f36f9ca7bfbec1133a41f07747 branch from dbeb2d5 to 099cae5 Compare November 13, 2018 20:07
@gottesmm
Copy link
Contributor Author

@swift-ci smoke test

@gottesmm gottesmm merged commit 1113bde into swiftlang:master Nov 13, 2018
@gottesmm gottesmm deleted the pr-20fe767974f279f36f9ca7bfbec1133a41f07747 branch November 13, 2018 20:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants