Skip to content

Performance Benchmarking of ExistentialSpecializer #24206

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 26, 2019

Conversation

rajbarik
Copy link
Contributor

Duplicate of https://github.com/apple/swift/pull/24065/files with appropriate random generator!

@rajbarik
Copy link
Contributor Author

@dabrahams @atrick @slavapestov Here is the new one with the new random number generator.

@rajbarik rajbarik requested review from atrick and dabrahams April 22, 2019 23:51
@rajbarik
Copy link
Contributor Author

@atrick please review this one when you get a chance. Since the previous one was reverted, I am reposting it with updated random number API.

Copy link
Contributor

@atrick atrick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @rajbarik. But, do you see the performance delta now when building/running this as part of the benchmark binary?

@rajbarik
Copy link
Contributor Author

rajbarik commented Apr 24, 2019

@atrick, I do see the performance delta locally. Earlier, it was not getting invoked as part of main.swift which prevented it from being reported.

@rajbarik rajbarik removed the request for review from dabrahams April 24, 2019 18:19
@atrick
Copy link
Contributor

atrick commented Apr 24, 2019

@swift-ci benchmark

@swift-ci
Copy link
Contributor

Performance: -O

TEST OLD NEW DELTA RATIO
Improvement
DataAppendDataLargeToLarge 45000 32400 -28.0% 1.39x (?)
StringBuilderLong 1190 1060 -10.9% 1.12x (?)
Added
BucketSort 34570 35410 35107

Performance: -Osize

TEST OLD NEW DELTA RATIO
Improvement
DataAppendDataLargeToLarge 45400 33200 -26.9% 1.37x (?)
ObjectiveCBridgeStubFromNSDateRef 4230 3540 -16.3% 1.19x (?)
ObjectiveCBridgeStubNSDateRefAccess 359 333 -7.2% 1.08x (?)
Added
BucketSort 34840 35450 35050

Performance: -Onone

TEST OLD NEW DELTA RATIO
Regression
DataAppendDataLargeToLarge 33600 45400 +35.1% 0.74x (?)
ObjectiveCBridgeStubFromNSDate 5610 6530 +16.4% 0.86x (?)
Improvement
ObjectiveCBridgeStubFromNSDateRef 4900 3990 -18.6% 1.23x (?)
ObjectiveCBridgeStubFromNSString 802 739 -7.9% 1.09x (?)
Added
BucketSort 107770 108570 108040
Benchmark Check Report
⚠️ BucketSort execution took at least 3407 μs.
Decrease the workload of BucketSort by a factor of 4 (10), to be less than 1000 μs.
How to read the data The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview
  Model Name: Mac Pro
  Model Identifier: MacPro6,1
  Processor Name: 8-Core Intel Xeon E5
  Processor Speed: 3 GHz
  Number of Processors: 1
  Total Number of Cores: 8
  L2 Cache (per Core): 256 KB
  L3 Cache: 25 MB
  Memory: 64 GB

@atrick
Copy link
Contributor

atrick commented Apr 25, 2019

@rajbarik looks like BucketSort takes at least 4x too much time to execute. To reduce noise, the current "microbenchmarking" approach to running the benchmarks expects them to be between 20 µs (for the timer precision and/or overhead) and 1 ms (to be within the thread-switch quantum).

@rajbarik
Copy link
Contributor Author

@atrick thanks. I will then decrease the number of iterations by a factor 4.

}
return true
}
let NUMITEMS = 2500
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@atrick reduced this from 10000 to 2500. should be fine now?

Copy link
Contributor

@atrick atrick Apr 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, my fault! The more important run time will be after you enable ExistentialSpecialization. I did this locally, and it only takes 160 µs with 10,000 items. So that was actually a better choice.

The only warning I see now is this.

memory: 'BucketSort' has very wide range of memory used between independent, repeated measurements.
memory: 'BucketSort' mem_pages [i1, i2]: min=[19, 19] 𝚫=0 R=[0, 24]

If you think of a way to improve it great, but I don't think it needs to hold up the commit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That memory warning is probably too sensitive. This benchmark looks just fine as is from the memory usage perspective.

@atrick
Copy link
Contributor

atrick commented Apr 25, 2019

@swift-ci benchmark

@atrick
Copy link
Contributor

atrick commented Apr 25, 2019

@swift-ci smoke test

@swift-ci
Copy link
Contributor

Performance: -O

TEST MIN MAX MEAN MAX_RSS
Added
BucketSort 35510 36500 35853

Performance: -Osize

TEST OLD NEW DELTA RATIO
Improvement
ObjectiveCBridgeStubFromNSDateRef 4740 3940 -16.9% 1.20x (?)
ObjectiveCBridgeStubNSDateRefAccess 400 371 -7.2% 1.08x (?)
Added
BucketSort 36670 37670 37217

Performance: -Onone

TEST OLD NEW DELTA RATIO
Regression
ObjectiveCBridgeStubFromNSDate 6400 7380 +15.3% 0.87x (?)
Improvement
ObjectiveCBridgeStubFromNSDateRef 5450 4450 -18.3% 1.22x (?)
Added
BucketSort 60690 60980 60843
Benchmark Check Report
⚠️ BucketSort execution took at least 3517 μs.
Decrease the workload of BucketSort by a factor of 4 (10), to be less than 1000 μs.
How to read the data The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview
  Model Name: Mac Pro
  Model Identifier: MacPro6,1
  Processor Name: 12-Core Intel Xeon E5
  Processor Speed: 2.7 GHz
  Number of Processors: 1
  Total Number of Cores: 12
  L2 Cache (per Core): 256 KB
  L3 Cache: 30 MB
  Memory: 64 GB

runFunction: run_BucketSort,
tags: [.validation, .algorithm],
setUpFunction: { buildWorkload() },
legacyFactor: 10)
Copy link
Contributor

@atrick atrick Apr 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rajbarik apparently legacyFactor is not something you're supposed to define. You can just remove it.

@eeckstein @gottesmm Please comment the purpose of this setting in TestsUtils.swift.
Actually, this looks like something @palimondo added.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for causing confusion, I’ll fix the docs tomorrow.

Copy link
Contributor

@atrick atrick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rajbarik I think if you just remove legacyFactor and go back to 10k items this can be merged now.

}
return true
}
let NUMITEMS = 2500
Copy link
Contributor

@atrick atrick Apr 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, my fault! The more important run time will be after you enable ExistentialSpecialization. I did this locally, and it only takes 160 µs with 10,000 items. So that was actually a better choice.

The only warning I see now is this.

memory: 'BucketSort' has very wide range of memory used between independent, repeated measurements.
memory: 'BucketSort' mem_pages [i1, i2]: min=[19, 19] 𝚫=0 R=[0, 24]

If you think of a way to improve it great, but I don't think it needs to hold up the commit.

Copy link
Contributor

@palimondo palimondo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven’t run this locally and the smoke benchmark doesn’t gather multiple independent samples from newly added benchmarks, but what variation are you seeing when running locally? Benchmark_Driver -i 10 BucketSort

I just want to be sure the true randomness isn’t making this too noisy. If it does, the PRNG could help.

let items: [Int] = {
var array: [Int]? = [Int]()
for _ in 0..<NUMITEMS {
array!.append(Int.random(in: 0..<MAXBUCKETSIZE))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we use a pseudorandom generator here, so that we are always testing the same workload and eliminate an unnecessary source of noise from the performance measurement?

See SplitMix64 over here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, benchmarks should always use the pseudo random generator.
I added the LSFR utility in TestsUtils.swift long ago for this purpose.

}
return true
}
let NUMITEMS = 2500
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That memory warning is probably too sensitive. This benchmark looks just fine as is from the memory usage perspective.

@rajbarik rajbarik merged commit 34fbbcc into swiftlang:master Apr 26, 2019
@rajbarik
Copy link
Contributor Author

@atrick Accidentally, I merged this one with NUMITEMS set to 2500. I will create another PR with NUMITEMS set back to 10000.

@atrick
Copy link
Contributor

atrick commented Apr 26, 2019

@rajbarik Thanks. If you could use LSFR instead of Random that would also encourage best practice.

@atrick
Copy link
Contributor

atrick commented Apr 26, 2019

@rajbarik While you're at it, please also remove the legacyFactor.

@rajbarik
Copy link
Contributor Author

@atrick will take care of both of these. Meanwhile, please run perf benchmarking at #19820.

@palimondo
Copy link
Contributor

@atrick The LSFR in TestUtils doesn’t conform for RandomNumberGenrator. See the SplitMix64 implementation linked above for a simple drop in pseudo-random generator that conforms to the protocol.

@palimondo
Copy link
Contributor

@rajbarik SplitMix64 implementation has now landed in TestUtils and is available for use in all benchmarks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants