-
Notifications
You must be signed in to change notification settings - Fork 10.5k
Performance Benchmarking of ExistentialSpecializer #24206
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@dabrahams @atrick @slavapestov Here is the new one with the new random number generator. |
@atrick please review this one when you get a chance. Since the previous one was reverted, I am reposting it with updated random number API. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @rajbarik. But, do you see the performance delta now when building/running this as part of the benchmark binary?
@atrick, I do see the performance delta locally. Earlier, it was not getting invoked as part of main.swift which prevented it from being reported. |
@swift-ci benchmark |
Performance: -O
Performance: -Osize
Performance: -Onone
How to read the dataThe tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.If you see any unexpected regressions, you should consider fixing the Noise: Sometimes the performance results (not code size!) contain false Hardware Overview
|
@rajbarik looks like BucketSort takes at least 4x too much time to execute. To reduce noise, the current "microbenchmarking" approach to running the benchmarks expects them to be between 20 µs (for the timer precision and/or overhead) and 1 ms (to be within the thread-switch quantum). |
@atrick thanks. I will then decrease the number of iterations by a factor 4. |
} | ||
return true | ||
} | ||
let NUMITEMS = 2500 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@atrick reduced this from 10000 to 2500. should be fine now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, my fault! The more important run time will be after you enable ExistentialSpecialization. I did this locally, and it only takes 160 µs with 10,000 items. So that was actually a better choice.
The only warning I see now is this.
memory: 'BucketSort' has very wide range of memory used between independent, repeated measurements.
memory: 'BucketSort' mem_pages [i1, i2]: min=[19, 19] 𝚫=0 R=[0, 24]
If you think of a way to improve it great, but I don't think it needs to hold up the commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That memory warning is probably too sensitive. This benchmark looks just fine as is from the memory usage perspective.
@swift-ci benchmark |
@swift-ci smoke test |
Performance: -O
Performance: -Osize
Performance: -Onone
How to read the dataThe tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.If you see any unexpected regressions, you should consider fixing the Noise: Sometimes the performance results (not code size!) contain false Hardware Overview
|
runFunction: run_BucketSort, | ||
tags: [.validation, .algorithm], | ||
setUpFunction: { buildWorkload() }, | ||
legacyFactor: 10) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rajbarik apparently legacyFactor
is not something you're supposed to define. You can just remove it.
@eeckstein @gottesmm Please comment the purpose of this setting in TestsUtils.swift.
Actually, this looks like something @palimondo added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for causing confusion, I’ll fix the docs tomorrow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rajbarik I think if you just remove legacyFactor
and go back to 10k items this can be merged now.
} | ||
return true | ||
} | ||
let NUMITEMS = 2500 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, my fault! The more important run time will be after you enable ExistentialSpecialization. I did this locally, and it only takes 160 µs with 10,000 items. So that was actually a better choice.
The only warning I see now is this.
memory: 'BucketSort' has very wide range of memory used between independent, repeated measurements.
memory: 'BucketSort' mem_pages [i1, i2]: min=[19, 19] 𝚫=0 R=[0, 24]
If you think of a way to improve it great, but I don't think it needs to hold up the commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven’t run this locally and the smoke benchmark doesn’t gather multiple independent samples from newly added benchmarks, but what variation are you seeing when running locally? Benchmark_Driver -i 10 BucketSort
I just want to be sure the true randomness isn’t making this too noisy. If it does, the PRNG could help.
let items: [Int] = { | ||
var array: [Int]? = [Int]() | ||
for _ in 0..<NUMITEMS { | ||
array!.append(Int.random(in: 0..<MAXBUCKETSIZE)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we use a pseudorandom generator here, so that we are always testing the same workload and eliminate an unnecessary source of noise from the performance measurement?
See SplitMix64
over here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, benchmarks should always use the pseudo random generator.
I added the LSFR
utility in TestsUtils.swift
long ago for this purpose.
} | ||
return true | ||
} | ||
let NUMITEMS = 2500 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That memory warning is probably too sensitive. This benchmark looks just fine as is from the memory usage perspective.
@atrick Accidentally, I merged this one with NUMITEMS set to 2500. I will create another PR with NUMITEMS set back to 10000. |
@rajbarik Thanks. If you could use LSFR instead of Random that would also encourage best practice. |
@rajbarik While you're at it, please also remove the legacyFactor. |
@atrick The |
@rajbarik |
Duplicate of https://github.com/apple/swift/pull/24065/files with appropriate random generator!