Skip to content

[stdlib] Make some more *Pointer operations _transparent #21126

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

jrose-apple
Copy link
Contributor

Not only was this affecting performance when building from parseable interfaces, but we'd also want these to be inlined for any sort of bounds-checking diagnostics / static analysis we might get in the future.

Not only was this affecting performance when building from parseable
interfaces, but we'd also want these to be inlined for any sort of
bounds-checking diagnostics / static analysis we might get in the
future.
@jrose-apple
Copy link
Contributor Author

@swift-ci Please test

@jrose-apple
Copy link
Contributor Author

@swift-ci Please benchmark

@jrose-apple
Copy link
Contributor Author

@swift-ci Please test compiler performance

@swift-ci
Copy link
Contributor

swift-ci commented Dec 7, 2018

Build comment file:

Performance: -O

TEST OLD NEW DELTA RATIO
Regression
CountAlgoString 1800 1950 +8.3% 0.92x
Improvement
Breadcrumbs.CopyUTF16CodeUnits.Mixed 66 55 -16.7% 1.20x
IterateData 1566 1399 -10.7% 1.12x

Code size: -O

TEST OLD NEW DELTA RATIO
Regression
RC4.o 4115 4203 +2.1% 0.98x
StringWalk.o 40610 41058 +1.1% 0.99x

Performance: -Osize

TEST OLD NEW DELTA RATIO
Improvement
MapReduceLazyCollectionShort 85 41 -51.8% 2.07x
IterateData 1547 1356 -12.3% 1.14x

Code size: -Osize

TEST OLD NEW DELTA RATIO
Regression
RC4.o 3825 3910 +2.2% 0.98x
ChainedFilterMap.o 3494 3566 +2.1% 0.98x
LazyFilter.o 8841 8993 +1.7% 0.98x
StringWalk.o 34754 35202 +1.3% 0.99x
SortLettersInPlace.o 8950 9054 +1.2% 0.99x
UTF8Decode.o 11057 11177 +1.1% 0.99x
NibbleSort.o 14506 14658 +1.0% 0.99x
Improvement
RandomShuffle.o 3807 3767 -1.1% 1.01x

Performance: -Onone

TEST OLD NEW DELTA RATIO
Improvement
PointerArithmetics 317509 82943 -73.9% 3.83x
ArrayPlusEqualThreeElements 9180 7730 -15.8% 1.19x
Radix2CooleyTukeyf 40893 34983 -14.5% 1.17x
PopFrontUnsafePointer 11882 10197 -14.2% 1.17x
PopFrontArrayGeneric 6044 5235 -13.4% 1.15x
Radix2CooleyTukey 45331 40704 -10.2% 1.11x
ArrayOfPOD 860 780 -9.3% 1.10x (?)
ArrayPlusEqualFiveElementCollection 196470 178932 -8.9% 1.10x (?)
ArrayPlusEqualSingleElementCollection 243742 222592 -8.7% 1.10x
How to read the data The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview
  Model Name: Mac Pro
  Model Identifier: MacPro6,1
  Processor Name: 12-Core Intel Xeon E5
  Processor Speed: 2.7 GHz
  Number of Processors: 1
  Total Number of Cores: 12
  L2 Cache (per Core): 256 KB
  L3 Cache: 30 MB
  Memory: 64 GB
--------------

@ravikandhadai
Copy link
Contributor

Sounds good to me.

@swift-ci
Copy link
Contributor

swift-ci commented Dec 8, 2018

Build comment file:

Summary for master full

Unexpected test results, excluded stats for NonEmpty, Tagged, Wordy, GRDB

Regressions found (see below)

Debug-batch

debug-batch brief

Regressed (0)
name old new delta delta_pct
Improved (1)
name old new delta delta_pct
Frontend.NumInstructionsExecuted 479,307,765,880,485 305,511,194,990,602 -173,796,570,889,883 -36.26% ✅
Unchanged (delta < 1.0% or delta < 100.0ms) (2)
name old new delta delta_pct
LLVM.NumLLVMBytesOutput 855,724,644 855,621,700 -102,944 -0.01%
time.swift-driver.wall 2624.0s 2628.0s 4.0s 0.15%

debug-batch detailed

Regressed (0)
name old new delta delta_pct
Improved (1)
name old new delta delta_pct
Frontend.NumInstructionsExecuted 479,307,765,880,485 305,511,194,990,602 -173,796,570,889,883 -36.26% ✅
Unchanged (delta < 1.0% or delta < 100.0ms) (94)
name old new delta delta_pct
AST.NumASTBytesAllocated 41,546,237,520 41,481,124,164 -65,113,356 -0.16%
AST.NumDecls 61,350 61,350 0 0.0%
AST.NumDependencies 147,609 147,615 6 0.0%
AST.NumImportedExternalDefinitions 917,115 917,115 0 0.0%
AST.NumInfixOperators 22,247 22,247 0 0.0%
AST.NumLinkLibraries 0 0 0 0.0%
AST.NumLoadedModules 174,735 174,735 0 0.0%
AST.NumLocalTypeDecls 112 112 0 0.0%
AST.NumObjCMethods 12,563 12,563 0 0.0%
AST.NumPostfixOperators 13 13 0 0.0%
AST.NumPrecedenceGroups 12,067 12,067 0 0.0%
AST.NumPrefixOperators 70 70 0 0.0%
AST.NumReferencedDynamicNames 101 101 0 0.0%
AST.NumReferencedMemberNames 2,790,334 2,790,334 0 0.0%
AST.NumReferencedTopLevelNames 196,563 196,563 0 0.0%
AST.NumSourceBuffers 273,817 273,817 0 0.0%
AST.NumSourceLines 2,035,430 2,035,430 0 0.0%
AST.NumSourceLinesPerSecond 777,219 775,628 -1,591 -0.2%
AST.NumTotalClangImportedEntities 3,361,290 3,358,756 -2,534 -0.08%
AST.NumUsedConformances 168,313 168,313 0 0.0%
Driver.ChildrenMaxRSS 66,317,901,824 66,440,245,248 122,343,424 0.18%
Driver.DriverDepCascadingDynamic 0 0 0 0.0%
Driver.DriverDepCascadingExternal 0 0 0 0.0%
Driver.DriverDepCascadingMember 0 0 0 0.0%
Driver.DriverDepCascadingNominal 0 0 0 0.0%
Driver.DriverDepCascadingTopLevel 0 0 0 0.0%
Driver.DriverDepDynamic 0 0 0 0.0%
Driver.DriverDepExternal 0 0 0 0.0%
Driver.DriverDepMember 0 0 0 0.0%
Driver.DriverDepNominal 0 0 0 0.0%
Driver.DriverDepTopLevel 0 0 0 0.0%
Driver.NumDriverJobsRun 12,834 12,834 0 0.0%
Driver.NumDriverJobsSkipped 0 0 0 0.0%
Driver.NumDriverPipePolls 313,242 311,992 -1,250 -0.4%
Driver.NumDriverPipeReads 352,115 351,184 -931 -0.26%
Driver.NumProcessFailures 0 0 0 0.0%
Frontend.MaxMallocUsage 344,948,699,368 344,481,859,296 -466,840,072 -0.14%
Frontend.NumProcessFailures 0 0 0 0.0%
IRModule.NumIRAliases 90,971 90,971 0 0.0%
IRModule.NumIRBasicBlocks 3,170,159 3,169,568 -591 -0.02%
IRModule.NumIRComdatSymbols 0 0 0 0.0%
IRModule.NumIRFunctions 1,529,993 1,529,627 -366 -0.02%
IRModule.NumIRGlobals 1,752,357 1,752,112 -245 -0.01%
IRModule.NumIRIFuncs 0 0 0 0.0%
IRModule.NumIRInsts 40,059,268 40,055,595 -3,673 -0.01%
IRModule.NumIRNamedMetaData 62,817 62,817 0 0.0%
IRModule.NumIRValueSymbols 2,935,090 2,934,480 -610 -0.02%
LLVM.NumLLVMBytesOutput 855,724,644 855,621,700 -102,944 -0.01%
Parse.NumFunctionsParsed 2,107,329 2,107,329 0 0.0%
Parse.NumIterableDeclContextParsed 837,746 837,746 0 0.0%
SILModule.NumSILGenDefaultWitnessTables 0 0 0 0.0%
SILModule.NumSILGenFunctions 1,232,769 1,232,769 0 0.0%
SILModule.NumSILGenGlobalVariables 23,603 23,603 0 0.0%
SILModule.NumSILGenVtables 10,134 10,134 0 0.0%
SILModule.NumSILGenWitnessTables 33,710 33,710 0 0.0%
SILModule.NumSILOptDefaultWitnessTables 0 0 0 0.0%
SILModule.NumSILOptFunctions 1,102,786 1,102,709 -77 -0.01%
SILModule.NumSILOptGlobalVariables 24,285 24,285 0 0.0%
SILModule.NumSILOptVtables 16,285 16,285 0 0.0%
SILModule.NumSILOptWitnessTables 66,038 66,048 10 0.02%
Sema.AccessLevelRequest 1,806,090 1,805,744 -346 -0.02%
Sema.DefaultAndMaxAccessLevelRequest 43,420 43,420 0 0.0%
Sema.EnumRawTypeRequest 12,383 12,383 0 0.0%
Sema.ExtendedNominalRequest 2,610,887 2,609,128 -1,759 -0.07%
Sema.InheritedDeclsReferencedRequest 80,618,003 80,568,589 -49,414 -0.06%
Sema.InheritedTypeRequest 436,634 436,564 -70 -0.02%
Sema.IsDynamicRequest 1,442,955 1,442,955 0 0.0%
Sema.IsObjCRequest 1,245,504 1,245,428 -76 -0.01%
Sema.NamedLazyMemberLoadFailureCount 17,450 17,433 -17 -0.1%
Sema.NamedLazyMemberLoadSuccessCount 11,937,533 11,936,317 -1,216 -0.01%
Sema.NominalTypeLookupDirectCount 23,308,079 23,300,391 -7,688 -0.03%
Sema.NumConformancesDeserialized 3,627,319 3,620,189 -7,130 -0.2%
Sema.NumConstraintScopes 10,904,711 10,903,953 -758 -0.01%
Sema.NumConstraintsConsideredForEdgeContraction 20,174,400 20,174,198 -202 -0.0%
Sema.NumDeclsDeserialized 29,276,093 29,218,217 -57,876 -0.2%
Sema.NumDeclsValidated 1,541,004 1,541,004 0 0.0%
Sema.NumFunctionsTypechecked 878,711 878,711 0 0.0%
Sema.NumGenericSignatureBuilders 836,845 836,213 -632 -0.08%
Sema.NumLazyGenericEnvironments 5,992,541 5,987,954 -4,587 -0.08%
Sema.NumLazyGenericEnvironmentsLoaded 162,912 162,896 -16 -0.01%
Sema.NumLazyIterableDeclContexts 4,745,106 4,742,303 -2,803 -0.06%
Sema.NumLeafScopes 7,688,393 7,687,768 -625 -0.01%
Sema.NumTypesDeserialized 10,771,143 10,763,142 -8,001 -0.07%
Sema.NumTypesValidated 1,030,219 1,030,219 0 0.0%
Sema.NumUnloadedLazyIterableDeclContexts 3,331,376 3,330,989 -387 -0.01%
Sema.OverriddenDeclsRequest 3,613,092 3,597,659 -15,433 -0.43%
Sema.RequirementRequest 54,264 54,264 0 0.0%
Sema.SelfBoundsFromWhereClauseRequest 47,058,986 47,003,899 -55,087 -0.12%
Sema.SetterAccessLevelRequest 98,822 98,822 0 0.0%
Sema.SuperclassDeclRequest 63,452,901 63,433,603 -19,298 -0.03%
Sema.SuperclassTypeRequest 30,156 30,156 0 0.0%
Sema.TypeDeclsFromWhereClauseRequest 25,755 25,755 0 0.0%
Sema.USRGenerationRequest 5,187,340 5,151,144 -36,196 -0.7%
Sema.UnderlyingTypeDeclsReferencedRequest 2,333,806 2,333,637 -169 -0.01%

Release

release brief

Regressed (1)
name old new delta delta_pct
Frontend.NumInstructionsExecuted 21,539,350,600,274 113,794,707,445,822 92,255,356,845,548 428.31% ⛔
Improved (0)
name old new delta delta_pct
Unchanged (delta < 1.0% or delta < 100.0ms) (2)
name old new delta delta_pct
LLVM.NumLLVMBytesOutput 787,860,498 787,466,118 -394,380 -0.05%
time.swift-driver.wall 4031.0s 4034.6s 3.5s 0.09%

release detailed

Regressed (0)
name old new delta delta_pct
Improved (0)
name old new delta delta_pct
Unchanged (delta < 1.0% or delta < 100.0ms) (23)
name old new delta delta_pct
AST.NumImportedExternalDefinitions 170,168 170,168 0 0.0%
AST.NumLoadedModules 10,893 10,893 0 0.0%
AST.NumTotalClangImportedEntities 580,399 580,399 0 0.0%
AST.NumUsedConformances 169,030 169,030 0 0.0%
IRModule.NumIRBasicBlocks 2,755,340 2,753,871 -1,469 -0.05%
IRModule.NumIRFunctions 1,269,707 1,269,802 95 0.01%
IRModule.NumIRGlobals 1,400,741 1,400,694 -47 -0.0%
IRModule.NumIRInsts 26,704,367 26,696,859 -7,508 -0.03%
IRModule.NumIRValueSymbols 2,499,689 2,499,737 48 0.0%
LLVM.NumLLVMBytesOutput 787,860,498 787,466,118 -394,380 -0.05%
SILModule.NumSILGenFunctions 536,766 536,766 0 0.0%
SILModule.NumSILOptFunctions 661,240 660,905 -335 -0.05%
Sema.NumConformancesDeserialized 1,507,305 1,496,474 -10,831 -0.72%
Sema.NumConstraintScopes 9,602,763 9,602,763 0 0.0%
Sema.NumDeclsDeserialized 3,917,533 3,901,081 -16,452 -0.42%
Sema.NumDeclsValidated 811,909 811,909 0 0.0%
Sema.NumFunctionsTypechecked 433,386 433,386 0 0.0%
Sema.NumGenericSignatureBuilders 141,432 141,432 0 0.0%
Sema.NumLazyGenericEnvironments 810,834 804,830 -6,004 -0.74%
Sema.NumLazyGenericEnvironmentsLoaded 15,034 15,034 0 0.0%
Sema.NumLazyIterableDeclContexts 516,716 515,451 -1,265 -0.24%
Sema.NumTypesDeserialized 2,119,012 2,109,098 -9,914 -0.47%
Sema.NumTypesValidated 389,670 389,670 0 0.0%

Copy link
Member

@lorentey lorentey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

I wonder if we need @_transparent here or if @inline(__always) would get us the same benefits.

@jrose-apple
Copy link
Contributor Author

We probably want this even at -Onone (and @inline(__always) doesn't do that [yet], see SR-7277), but ultimately I went with @_transparent because of the possibility of static bounds-checking or escape analysis diagnostics in the future. (Which is also why I tagged Ravi as a reviewer.)

@jrose-apple
Copy link
Contributor Author

Oh, this might cover SR-7274.

@jrose-apple
Copy link
Contributor Author

cc @atrick and @rjmccall too because of that

@jrose-apple jrose-apple merged commit c66a445 into swiftlang:master Dec 10, 2018
@jrose-apple jrose-apple deleted the transparisteel-vs-transparent-aluminum branch December 10, 2018 21:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants