Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Runtime] Avoid function pointer indirection in refcounting functions. #26516

Conversation

mikeash
Copy link
Contributor

@mikeash mikeash commented Aug 6, 2019

Functions like swift_retain call through a function pointer so that Instruments can interpose. This slows down the common case where there is no interposition. Instead, initialize the function pointers to NULL and call through directly to the real implementation when it's NULL. The compiler is smart enough to inline this call and the result is a single conditional branch rather than a function pointer call.

rdar://problem/18307425

Functions like swift_retain call through a function pointer so that Instruments can interpose. This slows down the common case where there is no interposition. Instead, initialize the function pointers to NULL and call through directly to the real implementation when it's NULL. The compiler is smart enough to inline this call and the result is a single conditional branch rather than a function pointer call.

rdar://problem/18307425
@mikeash
Copy link
Contributor Author

mikeash commented Aug 6, 2019

@swift-ci please benchmark

@swift-ci
Copy link
Contributor

swift-ci commented Aug 6, 2019

Performance: -O

Improvement OLD NEW DELTA RATIO
LazilyFilteredArrayContains 23000 18400 -20.0% 1.25x
UTF8Decode_InitDecoding 124 103 -16.9% 1.20x
ArraySetElement 305 262 -14.1% 1.16x
RandomShuffleLCG2 384 336 -12.5% 1.14x
Set.isSuperset.Seq.Empty.Int 53 47 -11.3% 1.13x (?)
Set.isDisjoint.Box.Empty 98 87 -11.2% 1.13x
RemoveWhereMoveInts 18 16 -11.1% 1.12x
Set.subtracting.Empty.Box 9 8 -11.1% 1.12x
Array2D 4144 3696 -10.8% 1.12x (?)
MapReduceAnyCollection 215 193 -10.2% 1.11x
Set.isSubset.Int.Empty 51 46 -9.8% 1.11x (?)
DataSetCountMedium 410 370 -9.8% 1.11x (?)
ObjectiveCBridgeStringHash 42 38 -9.5% 1.11x (?)
MapReduce 237 215 -9.3% 1.10x (?)
DistinctClassFieldAccesses 194 176 -9.3% 1.10x
ArrayLiteral2 78 71 -9.0% 1.10x (?)
DictionarySubscriptDefaultMutation 191 174 -8.9% 1.10x
RemoveWhereSwapInts 34 31 -8.8% 1.10x (?)
Integrate 217 198 -8.8% 1.10x (?)
FlattenListLoop 2512 2303 -8.3% 1.09x (?)
DropWhileAnySequence 2130 1953 -8.3% 1.09x (?)
DropFirstAnySequenceLazy 2188 2015 -7.9% 1.09x (?)
SubstringFromLongStringGeneric 13 12 -7.7% 1.08x (?)
PrefixWhileAnySeqCRangeIter 186 172 -7.5% 1.08x (?)
DropLastAnySequence 334 309 -7.5% 1.08x (?)
Set.isDisjoint.Seq.Box.Empty 67 62 -7.5% 1.08x (?)
Set.subtracting.Empty.Int 28 26 -7.1% 1.08x (?)
PrefixWhileAnySeqCntRange 186 173 -7.0% 1.08x
PrefixAnySequence 1769 1649 -6.8% 1.07x (?)
PrefixWhileSequence 195 182 -6.7% 1.07x (?)

Code size: -O

Performance: -Osize

Improvement OLD NEW DELTA RATIO
FlattenListLoop 2968 2169 -26.9% 1.37x (?)
UTF8Decode_InitDecoding 122 102 -16.4% 1.20x
ArraySetElement 305 262 -14.1% 1.16x
RangeAssignment 223 195 -12.6% 1.14x (?)
RemoveWhereSwapInts 33 29 -12.1% 1.14x
Set.isSuperset.Seq.Empty.Int 52 46 -11.5% 1.13x (?)
Set.isStrictSubset.Int.Empty 54 48 -11.1% 1.12x (?)
Array2D 3904 3472 -11.1% 1.12x
Set.isDisjoint.Box.Empty 110 98 -10.9% 1.12x (?)
Set.isDisjoint.Seq.Int.Empty 56 50 -10.7% 1.12x (?)
Set.isStrictSubset.Seq.Int.Empty 128 115 -10.2% 1.11x (?)
ObjectiveCBridgeStringHash 42 38 -9.5% 1.11x
DataCreateEmptyArray 1650 1500 -9.1% 1.10x (?)
Integrate 220 200 -9.1% 1.10x
MapReduce 258 237 -8.1% 1.09x (?)
MapReduceAnyCollection 259 238 -8.1% 1.09x (?)
RandomShuffleLCG2 400 368 -8.0% 1.09x (?)
Set.isSubset.Seq.Int.Empty 125 115 -8.0% 1.09x (?)
RemoveWhereFilterInts 25 23 -8.0% 1.09x (?)
SubstringFromLongStringGeneric 13 12 -7.7% 1.08x (?)
Set.isDisjoint.Seq.Box.Empty 95 88 -7.4% 1.08x (?)
DataSetCountMedium 410 380 -7.3% 1.08x (?)
PrefixWhileAnySeqCRangeIter 211 196 -7.1% 1.08x (?)
PrefixWhileSequence 201 187 -7.0% 1.07x (?)
PrefixWhileAnySeqCntRange 210 196 -6.7% 1.07x (?)

Code size: -Osize

Performance: -Onone

Regression OLD NEW DELTA RATIO
ArrayOfPOD 665 740 +11.3% 0.90x (?)
 
Improvement OLD NEW DELTA RATIO
UTF8Decode_InitDecoding 143 123 -14.0% 1.16x (?)
ObjectiveCBridgeStringHash 42 38 -9.5% 1.11x

Code size: -swiftlibs

How to read the data The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview
  Model Name: Mac mini
  Model Identifier: Macmini8,1
  Processor Name: Intel Core i7
  Processor Speed: 3.2 GHz
  Number of Processors: 1
  Total Number of Cores: 6
  L2 Cache (per Core): 256 KB
  L3 Cache: 12 MB
  Memory: 64 GB

size_t requiredAlignmentMask) {
if (SWIFT_UNLIKELY(_swift_allocObject))
return _swift_allocObject(metadata, requiredSize, requiredAlignmentMask);
else
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The else is unnecessary, and similar throughout.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, updated.

@rjmccall
Copy link
Contributor

Very nice!

…k against those values to call directly.

Instruments relies on the old values being there so it can call the original implementation. This has very slightly worse codegen but the impact should be minimal.
@mikeash
Copy link
Contributor Author

mikeash commented Aug 16, 2019

Turns out that Instruments doesn't work when the function pointers start out as NULL. Trying a different approach now where we initialize them as before, but then we check for that value and make the call directly. The compiler still inlines the implementation in this case. One additional leaq instruction is generated to get the address to compare with, which shouldn't hurt too much.

@mikeash
Copy link
Contributor Author

mikeash commented Aug 16, 2019

@swift-ci please benchmark

@swift-ci
Copy link
Contributor

Performance: -O

Regression OLD NEW DELTA RATIO
DistinctClassFieldAccesses 173 202 +16.8% 0.86x
SubstringFromLongString 7 8 +14.3% 0.88x
UTF8Decode_InitDecoding 110 119 +8.2% 0.92x (?)
 
Improvement OLD NEW DELTA RATIO
Set.isStrictSubset.Seq.Int.Empty 129 111 -14.0% 1.16x
Set.isDisjoint.Seq.Int.Empty 56 49 -12.5% 1.14x
ArrayLiteral2 82 72 -12.2% 1.14x (?)
Chars2 3350 2950 -11.9% 1.14x (?)
DataCreateEmptyArray 1700 1500 -11.8% 1.13x (?)
Set.isDisjoint.Box.Empty 99 88 -11.1% 1.12x (?)
Set.subtracting.Empty.Box 9 8 -11.1% 1.12x (?)
Set.isSubset.Seq.Int.Empty 126 113 -10.3% 1.12x
Set.isSuperset.Seq.Empty.Int 51 46 -9.8% 1.11x (?)
DataCreateSmallArray 2250 2050 -8.9% 1.10x (?)
NormalizedIterator_fastPrenormal 630 580 -7.9% 1.09x (?)
LazilyFilteredArrayContains 20200 18600 -7.9% 1.09x (?)
Set.isStrictSubset.Int.Empty 54 50 -7.4% 1.08x (?)
DictionarySubscriptDefaultMutation 190 176 -7.4% 1.08x
PrefixWhileAnySequence 1153 1071 -7.1% 1.08x (?)
Calculator 147 137 -6.8% 1.07x (?)
Breadcrumbs.IdxToUTF16Range.longMixed 288 269 -6.6% 1.07x (?)

Code size: -O

Performance: -Osize

Regression OLD NEW DELTA RATIO
LazilyFilteredArrayContains 38800 45000 +16.0% 0.86x (?)
HashTest 750 820 +9.3% 0.91x (?)
Hanoi 2160 2350 +8.8% 0.92x (?)
 
Improvement OLD NEW DELTA RATIO
Set.isSuperset.Seq.Empty.Int 54 46 -14.8% 1.17x (?)
Set.isDisjoint.Box.Empty 104 89 -14.4% 1.17x
Set.isDisjoint.Seq.Box.Empty 100 87 -13.0% 1.15x (?)
DataCreateEmptyArray 1700 1500 -11.8% 1.13x (?)
Set.isDisjoint.Int.Empty 61 54 -11.5% 1.13x (?)
Set.subtracting.Empty.Box 9 8 -11.1% 1.12x
Set.isSubset.Seq.Int.Empty 129 116 -10.1% 1.11x
Chars2 3550 3200 -9.9% 1.11x (?)
Set.isStrictSubset.Int.Empty 52 47 -9.6% 1.11x
Calculator 150 136 -9.3% 1.10x (?)
NormalizedIterator_fastPrenormal 660 600 -9.1% 1.10x (?)
Set.isStrictSubset.Seq.Int.Empty 130 119 -8.5% 1.09x (?)
Set.isDisjoint.Seq.Int.Empty 51 47 -7.8% 1.09x (?)
DataSetCountMedium 410 380 -7.3% 1.08x (?)
DictionaryBridgeToObjC_Bridge 15 14 -6.7% 1.07x (?)
Set.isStrictSuperset.Seq.Empty.Int 181 169 -6.6% 1.07x (?)

Code size: -Osize

Performance: -Onone

Regression OLD NEW DELTA RATIO
ArrayInClass 2185 2495 +14.2% 0.88x
StringWalk 7120 8120 +14.0% 0.88x
DataAppendDataSmallToSmall 2960 3360 +13.5% 0.88x (?)
StringFromLongWholeSubstring 9 10 +11.1% 0.90x
NopDeinit 113100 124700 +10.3% 0.91x (?)
 
Improvement OLD NEW DELTA RATIO
SubstringFromLongString 10 9 -10.0% 1.11x (?)
ArrayOfPOD 723 661 -8.6% 1.09x (?)
DataSetCountMedium 430 400 -7.0% 1.07x (?)
StringToDataSmall 750 700 -6.7% 1.07x (?)

Code size: -swiftlibs

How to read the data The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview
  Model Name: Mac mini
  Model Identifier: Macmini8,1
  Processor Name: Intel Core i7
  Processor Speed: 3.2 GHz
  Number of Processors: 1
  Total Number of Cores: 6
  L2 Cache (per Core): 256 KB
  L3 Cache: 12 MB
  Memory: 64 GB

@mikeash
Copy link
Contributor Author

mikeash commented Aug 27, 2019

@swift-ci please test

@swift-ci
Copy link
Contributor

Build failed
Swift Test OS X Platform
Git Sha - 9fd03e2

@mikeash
Copy link
Contributor Author

mikeash commented Aug 27, 2019

@swift-ci please test os x platform

@mikeash mikeash merged commit 9e61d53 into swiftlang:master Aug 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants