-
Notifications
You must be signed in to change notification settings - Fork 10.5k
[benchmark] Existential Redux #20666
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@aschwaighofer as original author of these benchmarks, could you please help me provide a description of their purpose? @atrick asked me to always add it for new tests over in #20552. My impression is that the different variants (originally From the comment in SortLargeExistentials:
The |
Also, I believe I've discovered a bug with releasing an existential IOU. Extracting the setup into array initialization functions did eliminate the setup overhead for all tests that do not modify the array. However, tests that mutate the array still have measurable overhead because they perform COW (I think?). Setup Overhead
After Setup Extraction
Note: Setup overhead under 5% of benchmark runtime is not reported. I’ve tried to work around this by transferring ownership of the pre-initialized array with the func grabArray() -> [Existential] { // transfer array ownership to caller
defer { array = nil }
return array
} This doesn't work either, and makes me think there's some strange instruction reordering going on: func grabArray() -> [Existential] {
guard array != nil else { fatalError("What?!?") }
let a = array!
array = nil
return a
} ...because this does trigger the fatal error. Unfortunately I wasn't able to reproduce this behavior in minimal test case outside of SBS. Filed as SR-9298. |
@eeckstein can you please run benchmark here, too? 🤖 doesn't recognize me yet… |
The purpose of those benchmarks was to evaluate different scenarios when I moved the implementation of existentials (protocol values) to heap based copy-on-write buffers to evaluate implementation decisions. I don't think we want to run those as part of regular performance tests. The performance boost of ClassValueBuffer4 vs ClassValueBuffer3 is expected because copying the existential only involves copying one reference of the heap based copy-on-write buffer (outline case) that holds the struct vs copying the individual fields of the struct in the inline case of ClassValueBuffer3. |
Why not? They still shows how well the optimizer handles these cases. After this cleanup it takes 1.3 seconds to run all 108 of them in a manner equivalent to one |
@swift-ci benchmark |
This comment has been minimized.
This comment has been minimized.
@swift-ci please smoke test |
Benchmark DistinctClassFieldAccesses was added in a strange place… See https://github.com/apple/swift/pull/18892/files#r212038958 Per discussion in in swiftlang#18892 (comment) I’m moving it to the extremely similar `ArrayInClass`, while fixing its workload size (and loop pattern) to make it relatively comparable.
Reduced base workloads. This allows precise measurements without the noise of accumulated error from unnecessarily long runtimes.
Painstakingly reverse-engineered the boilerplate generator, that produces ExistentialPerformance.swift (sans few whitespace differences), to ease maintanance (The upcoming refactoring).
Sanity refactoring: * Generate `BenchmarkInfo` in the same order as run loops. * Generate `IntValueBuffer0`, too. * Minor code formatting tweeks.
Renamed tests according to the naming convention proposed in PR swiftlang#20334. They now fit the 40 character limit and are structured into groups and variants. Also extracted tags into 2 constants, to simplify the [BenchmarkInfo] expression.
Existential.Array group had setup overhead caused by initialization of the existential array. Since this seems to be quite substatial and is dependant on the existential type, it makes sense to add Array.init benchmark group that will measure it explicitly. Array setup is extracted into 9 type-specific functions. The setup inside the `run_` functions now grabs that array, prepared in `setUpFunction`, excluding the initialization from the measurement. This helped with extracting setup overhead from most cases, but it appears that the mutable test still have measurable overhead because they perform COW. I’ve tried to work around this by transfering ownership of the pre-initialized array with the `grabArray` function, but nilling the IUO was crashing at runtime. This should be fixed later.
Break up BenchmarkInfo into 2 lines to fit 80 column line limit.
Benchmarks from Array group (except the `init`) don’t use the `withType` parameter from the `run_` function anymore. The type specific variation is taken care of in the `BenchmarkInfo`, since the `existentialArray` with appropriate type is created by the `setUpFunction`. This means we can reuse the same non-generic `run_ ` function for the whole group and eliminate all the specialized `runFunction` variants. This eliminates 144 lines of boilerplate.
The technique from preceding commit, can be used to fully determine the tested type variant in the `setUpFunction` and use a non-generic `runFunction`s for all the benchmarks. This eliminates 202 lines of boilerplate.
Bumping up the multipliers to get above 20 μs runtime.
31b5b44
to
b61a63d
Compare
@swift-ci Please benchmark |
Build comment file:Performance: -O
Code size: -O
Performance: -Osize
Code size: -Osize
Performance: -Onone
How to read the dataThe tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.If you see any unexpected regressions, you should consider fixing the Noise: Sometimes the performance results (not code size!) contain false Hardware Overview
|
@swift-ci please smoke test |
As I've explained above, the setup overhead cannot yet be completely eliminated due to SR-9298. The whole set of 109 benchmarks adds less than 1 second to the full measured iteration of @eeckstein Please review 🙏 (when you get back...) |
As @aschwaighofer (the author of the benchmarks) already said, these benchmarks are not really there to find optimization regressions/improvements. Arnold added them to evaluate different implementation scenarios. Even if they runtime of those benchmarks is minimal, I'm not in favor of adding benchmarks which are not really useful. |
In that case they should be removed from SBS altogether. Edit: we should probably discuss the philosophy of extending performance coverage and build a consensus in the forums. I'll open that topic once I'm done with the janitor duty — there are still some loose ends to tie... |
@eeckstein Are you objecting to re-enabling these benchmarks or to the refactoring+renaming itself? |
Don’t run the ExistentialPerformance benchmarks as part of the pre-commit suite.
@swift-ci please benchmark |
!!! Couldn't read commit file !!! |
@swift-ci please smoke test |
yes, that would be fine |
@eeckstein Would you put that in the review or can I just merge once the checks are completed? |
Performance: -O
Code size: -O
Performance: -Osize
Code size: -Osize
Performance: -Onone
How to read the dataThe tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.If you see any unexpected regressions, you should consider fixing the Noise: Sometimes the performance results (not code size!) contain false Hardware Overview
|
This PR reintroduces the Existential benchmark family with 108 performance tests. It tests the performance of varying number of non-mutating and mutating method calls on instances of existential types of growing size and arrays thereof (12 groups x 9 variants).
It was previously marked as unstable and was not executed as part of the pre-commit suite. The root of the instability was too big inner loop multiplier (5_000_000 and 5_000), which meant the workload was very susceptible to accumulation of measurement error caused by context switching. Lowering the inner loop multipliers so that each benchmark runs in under 1000 μs results in clearly defined runtimes when measured with More Robust Benchmark_Driver.
The benchmarks were renamed according to naming convention proposed in #20334, as the original names were among the longest in the SBS. For example
ExistentialTestPassExistentialTwoMethodCalls_ClassValueBuffer4
becameExistential.Pass.method.2x.Ref4
.Smaller workload revealed significant setup overhead caused by initialization of the existential array in the
Existential.Array
group, which was isolated into type specificsetUpFunction
s. NewExistential.Array.init
benchmark group was added to measure this explicitly.New
BenchmarkCategory.existential
was added to tag these tests. Running these 108 test in a manner equivalent torun_smoke_bench
(time ./Benchmark_O --tags=existential --sample-time=0.0025 --num-samples=3
) take 1.3 seconds on my 2008 MBP — properly sized benchmarks improve measurement precision and have negligible impact on the overall time it takes to run benchmark suite.Procedural note: This refactoring was facilitated by first reverse-engineering a
.gyb
file that generated the original 810 LOC swift file. This allowed for systematic changes of the template used to generate the original 99 tests. Isolating the array creation into setup method enabled removal of specializedrun_
functions, eliminating 144 LOC. This technique was than applied to the rest of test, eliminating another 202 LOC. The use of GYB has proven indispensable in performing this refactoring, vastly improving the code maintainability.The refactoring is thoroughly documented in the commit descriptions, it is therefore best to review this PR by individual commits (check
.gyb
first, then the generated.swift
for the results of the change).