-
Notifications
You must be signed in to change notification settings - Fork 10.5k
Add support for GYB in benchmarks #8641
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Can I get a review, please? CC @gottesmm @lplarson @airspeedswift |
|
||
from __future__ import print_function | ||
|
||
import glob | ||
import argparse | ||
import jinja2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Somehow, using Jinja and Gyb together seems like overkill. Is this really wise?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m not sure what you mean here: jinja2
was used for templating theCMakeLists.txt
and utils/main.swift
before my changes. I’m trying to add gyb
in order to re-use the skill I learned from other parts of swift stdlib
to reduce boilerplate in tests. I’m not touching the jinja2
part at all, outside of alphabetically ordering import declarations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@palimondo I think what DaveA is saying is that both are python templating systems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@palimondo @dabrahams My personal feeling is that we should probably standardize on one of them. That being said, if we move to swiftpm all the uses of jinja2 will probably go away.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GYB is clearly superior for Swift code templating due to its use in stdlib
- hence the motivation for this PR. But I don’t see a way to use it to generate the CMakeLists.txt
: generate_harness.py
walks directories scans files and feeds the results into jinja2 template. I don’t see the way to externally parametrize gyb
invocations in this way. .swift.gyb
files are AFAIK self-contained.
Are we now debating merits of my PR or general state of prior technology used in this part of the project?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@palimondo Look in Testing.rst. There is a description there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@palimondo wrote:
I don’t see the way to externally parametrize gyb invocations in this way.
You don't need to parameterize them as you could do all the directory walking right in the gyb file itself. But if you did need to parameterize them, you could use the "expand" function to expand other .gyb templates inline. The other source of parameterization is -D name=value
arguments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dabrahams Can we agree, that what you are suggesting is outside the scope of this PR? Let’s open a new SR in Jira for getting rid of jinja2
in benchmarks, cc @lplarson on that, and I’ll be happy to work on that later…
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@palimondo yes, agreed!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should I split the new performance test (DropFirst, DropWhile, Prefix, PrefixWhile) into separate PR? |
@swift-ci benchmark |
@palimondo I think one pull request is fine. |
I would also like to adjust the constants used in these tests. So far I have been using those introduced in |
I'd prefer to change the other tests in a different pull request. |
In the new pull request, please ask therealbnut for a review of changing the count to 2048 for the other benchmarks. |
Build comment file:Build failed before running benchmark. |
Build failed due to failure running But I need help with the It looks like when we are trying to get to |
Looks like we need to modify |
@swift-cni test |
@swift-ci benchmark |
@swift-ci test |
Build failed |
Build failed |
@swift-ci benchmark |
Build comment file:Optimized (O) |
@lplarson, please re-run the test and benchmark, I have fixed the python-lint issues. |
@swift-ci test |
@swift-ci benchmark |
@palimondo Got you covered, man! |
@palimondo Can you add a large comment to the top of: benchmark/scripts/generate_harness/test_generate_harness.sh explaining what it is doing (i.e. its purpose) and how it is invoked. I am imagining something that basically says that the script is meant to ensure that the checked in template files in the benchmark suite always match what would be regenerated if one re-ran the relevant scripts. I would also mention that the reason to use a lit test here is that we want to ensure that it runs on all smoke test runs so that the code checked into tree is always correct. |
I would actually wait until the benchmark/test run finishes. Otherwise, it will reset since you added another commit. Assuming that everything works out from the full test run/benchmark run and we get that nice comment in, I will quickly run a smoke test merge afterwards and we can get this work in! = ). |
I’ll write that comment, but I also noticed I had accidentally named the directory under |
…ks -> benchmark].
@gottesmm I looked at the console log of the benchmark in progress, and I don’t see it running any of my new tests… am I missing something? |
Build comment file:Optimized (O) |
@palimondo I think it is b/c it is doing a before/after comparison. If you look at the first group of runs, it is there. Can you add the changes that I requested and then I will do a smoke test and merge |
@gottesmm I think we’re ready to smoke and merge. |
@swift-ci smoke test and merge |
1 similar comment
@swift-ci smoke test and merge |
Thank you @gottesmm, @lplarson, @dabrahams ! |
@eeckstein Re:
I've patterned my additions after #7420 Benchmarks for dropLast and suffix by @therealbnut I'm not sure how hoisting them out of that loop will check the result - for different Ns. I think as long as the overhead between comparisons of the same method on different Sequences is the same, the test still works. |
You could just add the results for all iterations and compare against the total sum after the loop. I think the chance that this will miss a "compensated" wrong computation is quite low. |
@eeckstein I agree as the error tends to zero as However, it's a pattern that could easily cause mistakes. This would not work if the cost of Anyway, I think in an ideal world we would do something that complete removes the verification when calculating times (out of scope of this PR): func CheckResults(_ test: @autoclosure () -> Bool, _ message: String) {
guard VERIFY else { return }
assert(test(), message)
} Run like this: run-benchmarks -N=1 -verify=true
run-benchmarks -N=10 -verify=false The documentation states that |
A closure is bad because it involves memory allocation which is much more expensive than the benchmark body itself for some benchmarks (That's what @dabrahams found). |
I'm not so sure that's the only/correct way to interpret it… see discussion in #9298. |
Note that the benchmark body is meant to take a minimum amount of time to run, from the README:
I believe the intent is that the time of the setup/teardown/verification should be negligible by comparison to the work being measured. So if the time for a closure is significant then the body should probably be iterated enough times that it isn't. |
As discussed on swift-dev, the repetitive nature of some performance tests lends itself very well to templataing. In order to reduce the chance of copy paste errors when adding new test cases, this PR adds support to generate tests using
gyb
(Generate Your Boilerplate) script that is employed in other parts of the project. This is done as a part of generating the test harness inbenchmark/scripts/generate_harness/generate_harness.py
.When adding new performance tests to the benchmark suit, contributors sometimes forget to regenerate the test harness using
generate_harness.py
and instead manually edit thebenchmark/CMakeLists.txt
andbenchmark/utils/main.swift
files. To ensure this mistake is caught by CI bots, this also adds validation-test that guards against this issue.Resolves SR-4533 and SR-4534.
GYB is then used to refactor tests for sequence methods
DropLast.swift
andSuffix.swift
:[DropLast/Suffix][Any?]Sequence
tests to useUnfoldSequence
in line with equivalent tests inMapReduce.swift
...AnySeqCntRange
andAnySeqCRangeIter
AnyCollection
wrappingCountableRange
This also introduces new performance tests for sequence method
DropFirst.swift
,DropWhile.swift
,Prefix.swift
,PrefixWhile.swift
with all the same variants as described above.