Skip to content

Add support for GYB in benchmarks #8641

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Apr 14, 2017

Conversation

palimondo
Copy link
Contributor

@palimondo palimondo commented Apr 8, 2017

As discussed on swift-dev, the repetitive nature of some performance tests lends itself very well to templataing. In order to reduce the chance of copy paste errors when adding new test cases, this PR adds support to generate tests using gyb (Generate Your Boilerplate) script that is employed in other parts of the project. This is done as a part of generating the test harness in benchmark/scripts/generate_harness/generate_harness.py.

When adding new performance tests to the benchmark suit, contributors sometimes forget to regenerate the test harness using generate_harness.py and instead manually edit the benchmark/CMakeLists.txt and benchmark/utils/main.swift files. To ensure this mistake is caught by CI bots, this also adds validation-test that guards against this issue.

Resolves SR-4533 and SR-4534.

GYB is then used to refactor tests for sequence methods DropLast.swift and Suffix.swift:

  • changing [DropLast/Suffix][Any?]Sequence tests to use UnfoldSequence in line with equivalent tests in MapReduce.swift...
  • ...while preserving the equivalents to previous test version under AnySeqCntRange and AnySeqCRangeIter
  • adding a test for AnyCollection wrapping CountableRange
  • adding performance tests for lazy variants of all tested sequences

This also introduces new performance tests for sequence method DropFirst.swift, DropWhile.swift, Prefix.swift, PrefixWhile.swift with all the same variants as described above.

@palimondo palimondo changed the title [WIP] Add support for GYB in benchmarks Add support for GYB in benchmarks Apr 10, 2017
@palimondo
Copy link
Contributor Author

Can I get a review, please? CC @gottesmm @lplarson @airspeedswift


from __future__ import print_function

import glob
import argparse
import jinja2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somehow, using Jinja and Gyb together seems like overkill. Is this really wise?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m not sure what you mean here: jinja2 was used for templating theCMakeLists.txt and utils/main.swift before my changes. I’m trying to add gyb in order to re-use the skill I learned from other parts of swift stdlib to reduce boilerplate in tests. I’m not touching the jinja2 part at all, outside of alphabetically ordering import declarations.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@palimondo I think what DaveA is saying is that both are python templating systems.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@palimondo @dabrahams My personal feeling is that we should probably standardize on one of them. That being said, if we move to swiftpm all the uses of jinja2 will probably go away.

Copy link
Contributor Author

@palimondo palimondo Apr 11, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GYB is clearly superior for Swift code templating due to its use in stdlib - hence the motivation for this PR. But I don’t see a way to use it to generate the CMakeLists.txt: generate_harness.py walks directories scans files and feeds the results into jinja2 template. I don’t see the way to externally parametrize gyb invocations in this way. .swift.gyb files are AFAIK self-contained.

Are we now debating merits of my PR or general state of prior technology used in this part of the project?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@palimondo Look in Testing.rst. There is a description there.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@palimondo wrote:

I don’t see the way to externally parametrize gyb invocations in this way.

You don't need to parameterize them as you could do all the directory walking right in the gyb file itself. But if you did need to parameterize them, you could use the "expand" function to expand other .gyb templates inline. The other source of parameterization is -D name=value arguments.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dabrahams Can we agree, that what you are suggesting is outside the scope of this PR? Let’s open a new SR in Jira for getting rid of jinja2 in benchmarks, cc @lplarson on that, and I’ll be happy to work on that later…

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@palimondo yes, agreed!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@palimondo
Copy link
Contributor Author

Should I split the new performance test (DropFirst, DropWhile, Prefix, PrefixWhile) into separate PR?
(i.e. keep here only the harness modifications and DropLast and Suffix refactoring)

@lplarson
Copy link
Contributor

@swift-ci benchmark

@lplarson
Copy link
Contributor

@palimondo I think one pull request is fine.

@palimondo
Copy link
Contributor Author

I would also like to adjust the constants used in these tests. So far I have been using those introduced in Suffix and DropLast: sequenceCount = 4098 and [dropCount|suffixCount] = 1024. Give that these test are partly focused on exposing the performance deficits in AnySequence, I think it might be better to always split the sequence in half. That way DropFirst processes first half of the sequence internally and then has to wend the second half of the sequence through AnySequence. So I’d like to change [drop|preffix|suffix]Counts to 2048. Are you OK with getting new baselines for DropLast and Suffix while we change that?

@lplarson
Copy link
Contributor

I'd prefer to change the other tests in a different pull request.

@lplarson
Copy link
Contributor

In the new pull request, please ask therealbnut for a review of changing the count to 2048 for the other benchmarks.

@swift-ci
Copy link
Contributor

Build comment file:

Build failed before running benchmark.


@palimondo
Copy link
Contributor Author

palimondo commented Apr 12, 2017

Build failed due to failure running SortSortedStrings?!? -- I didn’t touch that.

But I need help with the lit test: I have run --validation-test and my new test for freshly generated harness is failing on my machine (in addition to failure in TestCharacterSet.test_AnyHashableContainingCharacterSettypeTypeRef - should I report this somewhere?).

It looks like when we are trying to get to %swift_src_root variable, the %swift part gets substituted for swift invocation. How does one access that variable?

@palimondo
Copy link
Contributor Author

Looks like we need to modify lit.cfg and add a new substitution like this one? config.substitutions.append( ('%swift_obj_root', config.swift_obj_root) )

@lplarson
Copy link
Contributor

@swift-cni test

@lplarson
Copy link
Contributor

@swift-ci benchmark

@lplarson
Copy link
Contributor

@swift-ci test

@swift-ci
Copy link
Contributor

Build failed
Jenkins build - Swift Test Linux Platform
Git Commit - 1c9cc38
Test requested by - @lplarson

@swift-ci
Copy link
Contributor

Build failed
Jenkins build - Swift Test OS X Platform
Git Commit - 1c9cc38
Test requested by - @lplarson

@lplarson
Copy link
Contributor

@swift-ci benchmark

@swift-ci
Copy link
Contributor

Build comment file:

Optimized (O)


@palimondo
Copy link
Contributor Author

palimondo commented Apr 13, 2017

@lplarson, please re-run the test and benchmark, I have fixed the python-lint issues.

@gottesmm
Copy link
Contributor

@swift-ci test

@gottesmm
Copy link
Contributor

@swift-ci benchmark

@gottesmm
Copy link
Contributor

gottesmm commented Apr 13, 2017

@palimondo Got you covered, man!

@gottesmm
Copy link
Contributor

@palimondo Can you add a large comment to the top of:

benchmark/scripts/generate_harness/test_generate_harness.sh

explaining what it is doing (i.e. its purpose) and how it is invoked. I am imagining something that basically says that the script is meant to ensure that the checked in template files in the benchmark suite always match what would be regenerated if one re-ran the relevant scripts. I would also mention that the reason to use a lit test here is that we want to ensure that it runs on all smoke test runs so that the code checked into tree is always correct.

@gottesmm
Copy link
Contributor

I would actually wait until the benchmark/test run finishes. Otherwise, it will reset since you added another commit.

Assuming that everything works out from the full test run/benchmark run and we get that nice comment in, I will quickly run a smoke test merge afterwards and we can get this work in! = ).

@palimondo
Copy link
Contributor Author

I’ll write that comment, but I also noticed I had accidentally named the directory under validation-test in plural - benchmarks instead of benchmark (that matches the one under swift). So I’ll push that rename in the same commit, OK?

@palimondo
Copy link
Contributor Author

palimondo commented Apr 13, 2017

@gottesmm I looked at the console log of the benchmark in progress, and I don’t see it running any of my new tests… am I missing something?

@swift-ci
Copy link
Contributor

Build comment file:

Optimized (O)


@gottesmm
Copy link
Contributor

@palimondo I think it is b/c it is doing a before/after comparison. If you look at the first group of runs, it is there. Can you add the changes that I requested and then I will do a smoke test and merge

@palimondo
Copy link
Contributor Author

@gottesmm I think we’re ready to smoke and merge.

@gottesmm
Copy link
Contributor

@swift-ci smoke test and merge

1 similar comment
@gottesmm
Copy link
Contributor

@swift-ci smoke test and merge

@swift-ci swift-ci merged commit bf08d01 into swiftlang:master Apr 14, 2017
@palimondo palimondo deleted the sequence-benchmarks branch April 14, 2017 08:12
@palimondo
Copy link
Contributor Author

Thank you @gottesmm, @lplarson, @dabrahams !

@palimondo
Copy link
Contributor Author

palimondo commented May 4, 2017

@eeckstein Re:

Yeah, CheckResults should not be inside the for 1...N loop.

I've patterned my additions after #7420 Benchmarks for dropLast and suffix by @therealbnut
They check the correctness of the operation, too. I found it quite useful when crafting variations on the original tests. Some of my sequence versions were buggy (of by iteration) on first attempt.

I'm not sure how hoisting them out of that loop will check the result - for different Ns. I think as long as the overhead between comparisons of the same method on different Sequences is the same, the test still works.

@eeckstein
Copy link
Contributor

@palimondo

I'm not sure how hoisting them out of that loop will check the result - for different Ns.

You could just add the results for all iterations and compare against the total sum after the loop. I think the chance that this will miss a "compensated" wrong computation is quite low.

@therealbnut
Copy link
Contributor

therealbnut commented May 5, 2017

@eeckstein I agree as the error tends to zero as N→∞, and the cost of CheckResults is negligible.

However, it's a pattern that could easily cause mistakes. This would not work if the cost of CheckResults(...) is non-linear with the inner loop (n), so the cost grows quickly as N increases. For example checking a binary tree with CheckResults being O(n log(n)) has a large error if you use n*N rather than just n.

Anyway, I think in an ideal world we would do something that complete removes the verification when calculating times (out of scope of this PR):

func CheckResults(_ test: @autoclosure () -> Bool, _ message: String) {
   guard VERIFY else { return }
   assert(test(), message)
}

Run like this:

run-benchmarks -N=1 -verify=true
run-benchmarks -N=10 -verify=false

The documentation states that N=1 should take a minimum amount of time (to minimise the impact of error like CheckResults), it would be nice to assert this in the first run as well.

@eeckstein
Copy link
Contributor

A closure is bad because it involves memory allocation which is much more expensive than the benchmark body itself for some benchmarks (That's what @dabrahams found).

@palimondo
Copy link
Contributor Author

I'm not so sure that's the only/correct way to interpret it… see discussion in #9298.

@therealbnut
Copy link
Contributor

therealbnut commented May 5, 2017

Note that the benchmark body is meant to take a minimum amount of time to run, from the README:

The benchmark driver will measure the time taken for N = 1 and automatically calculate
the necessary number of iterations N to run each benchmark in approximately one second,
so the test should ideally run in a few milliseconds for N = 1. If the test contains
any setup code before the loop, ensure the time spent on setup is insignificant compared to
the time spent inside the loop (for N = 1) -- otherwise the automatic calculation of N might be
significantly off and any performance gains/regressions will be masked by the fixed setup time.

I believe the intent is that the time of the setup/teardown/verification should be negligible by comparison to the work being measured. So if the time for a closure is significant then the body should probably be iterated enough times that it isn't.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants