Add support for GYB in benchmarks #8641

palimondo · 2017-04-08T07:32:41Z

As discussed on swift-dev, the repetitive nature of some performance tests lends itself very well to templataing. In order to reduce the chance of copy paste errors when adding new test cases, this PR adds support to generate tests using gyb (Generate Your Boilerplate) script that is employed in other parts of the project. This is done as a part of generating the test harness in benchmark/scripts/generate_harness/generate_harness.py.

When adding new performance tests to the benchmark suit, contributors sometimes forget to regenerate the test harness using generate_harness.py and instead manually edit the benchmark/CMakeLists.txt and benchmark/utils/main.swift files. To ensure this mistake is caught by CI bots, this also adds validation-test that guards against this issue.

Resolves SR-4533 and SR-4534.

GYB is then used to refactor tests for sequence methods DropLast.swift and Suffix.swift:

changing [DropLast/Suffix][Any?]Sequence tests to use UnfoldSequence in line with equivalent tests in MapReduce.swift...
...while preserving the equivalents to previous test version under AnySeqCntRange and AnySeqCRangeIter
adding a test for AnyCollection wrapping CountableRange
adding performance tests for lazy variants of all tested sequences

This also introduces new performance tests for sequence method DropFirst.swift, DropWhile.swift, Prefix.swift, PrefixWhile.swift with all the same variants as described above.

palimondo · 2017-04-10T23:27:33Z

Can I get a review, please? CC @gottesmm @lplarson @airspeedswift

dabrahams · 2017-04-11T04:23:43Z

benchmark/scripts/generate_harness/generate_harness.py


 from __future__ import print_function

-import glob
+import argparse
+import jinja2


Somehow, using Jinja and Gyb together seems like overkill. Is this really wise?

I’m not sure what you mean here: jinja2 was used for templating theCMakeLists.txt and utils/main.swift before my changes. I’m trying to add gyb in order to re-use the skill I learned from other parts of swift stdlib to reduce boilerplate in tests. I’m not touching the jinja2 part at all, outside of alphabetically ordering import declarations.

@palimondo I think what DaveA is saying is that both are python templating systems.

@palimondo @dabrahams My personal feeling is that we should probably standardize on one of them. That being said, if we move to swiftpm all the uses of jinja2 will probably go away.

GYB is clearly superior for Swift code templating due to its use in stdlib - hence the motivation for this PR. But I don’t see a way to use it to generate the CMakeLists.txt: generate_harness.py walks directories scans files and feeds the results into jinja2 template. I don’t see the way to externally parametrize gyb invocations in this way. .swift.gyb files are AFAIK self-contained.

Are we now debating merits of my PR or general state of prior technology used in this part of the project?

@palimondo Look in Testing.rst. There is a description there.

@palimondo wrote:

I don’t see the way to externally parametrize gyb invocations in this way.

You don't need to parameterize them as you could do all the directory walking right in the gyb file itself. But if you did need to parameterize them, you could use the "expand" function to expand other .gyb templates inline. The other source of parameterization is -D name=value arguments.

@dabrahams Can we agree, that what you are suggesting is outside the scope of this PR? Let’s open a new SR in Jira for getting rid of jinja2 in benchmarks, cc @lplarson on that, and I’ll be happy to work on that later…

@palimondo yes, agreed!

…hile

palimondo · 2017-04-11T23:18:20Z

Should I split the new performance test (DropFirst, DropWhile, Prefix, PrefixWhile) into separate PR?
(i.e. keep here only the harness modifications and DropLast and Suffix refactoring)

lplarson · 2017-04-11T23:18:23Z

@swift-ci benchmark

lplarson · 2017-04-11T23:20:21Z

@palimondo I think one pull request is fine.

palimondo · 2017-04-12T00:12:25Z

I would also like to adjust the constants used in these tests. So far I have been using those introduced in Suffix and DropLast: sequenceCount = 4098 and [dropCount|suffixCount] = 1024. Give that these test are partly focused on exposing the performance deficits in AnySequence, I think it might be better to always split the sequence in half. That way DropFirst processes first half of the sequence internally and then has to wend the second half of the sequence through AnySequence. So I’d like to change [drop|preffix|suffix]Counts to 2048. Are you OK with getting new baselines for DropLast and Suffix while we change that?

lplarson · 2017-04-12T00:21:57Z

I'd prefer to change the other tests in a different pull request.

lplarson · 2017-04-12T00:25:43Z

In the new pull request, please ask therealbnut for a review of changing the count to 2048 for the other benchmarks.

swift-ci · 2017-04-12T02:30:01Z

Build comment file:

Build failed before running benchmark.

palimondo · 2017-04-12T07:06:28Z

Build failed due to failure running SortSortedStrings?!? -- I didn’t touch that.

But I need help with the lit test: I have run --validation-test and my new test for freshly generated harness is failing on my machine (in addition to failure in TestCharacterSet.test_AnyHashableContainingCharacterSettypeTypeRef - should I report this somewhere?).

It looks like when we are trying to get to %swift_src_root variable, the %swift part gets substituted for swift invocation. How does one access that variable?

palimondo · 2017-04-12T07:27:25Z

Looks like we need to modify lit.cfg and add a new substitution like this one? config.substitutions.append( ('%swift_obj_root', config.swift_obj_root) )

lplarson · 2017-04-12T21:08:14Z

@swift-cni test

lplarson · 2017-04-12T21:08:44Z

@swift-ci benchmark

lplarson · 2017-04-12T21:08:55Z

@swift-ci test

swift-ci · 2017-04-12T21:09:10Z

Build failed
Jenkins build - Swift Test Linux Platform
Git Commit - 1c9cc38
Test requested by - @lplarson

swift-ci · 2017-04-12T21:09:40Z

Build failed
Jenkins build - Swift Test OS X Platform
Git Commit - 1c9cc38
Test requested by - @lplarson

lplarson · 2017-04-12T21:14:46Z

@swift-ci benchmark

swift-ci · 2017-04-13T02:40:20Z

Build comment file:

Optimized (O)

palimondo · 2017-04-13T11:03:27Z

@lplarson, please re-run the test and benchmark, I have fixed the python-lint issues.

gottesmm · 2017-04-13T17:32:43Z

@swift-ci test

gottesmm · 2017-04-13T17:33:15Z

@swift-ci benchmark

gottesmm · 2017-04-13T17:33:45Z

@palimondo Got you covered, man!

gottesmm · 2017-04-13T17:38:24Z

@palimondo Can you add a large comment to the top of:

benchmark/scripts/generate_harness/test_generate_harness.sh

explaining what it is doing (i.e. its purpose) and how it is invoked. I am imagining something that basically says that the script is meant to ensure that the checked in template files in the benchmark suite always match what would be regenerated if one re-ran the relevant scripts. I would also mention that the reason to use a lit test here is that we want to ensure that it runs on all smoke test runs so that the code checked into tree is always correct.

gottesmm · 2017-04-13T17:39:56Z

I would actually wait until the benchmark/test run finishes. Otherwise, it will reset since you added another commit.

Assuming that everything works out from the full test run/benchmark run and we get that nice comment in, I will quickly run a smoke test merge afterwards and we can get this work in! = ).

palimondo · 2017-04-13T19:22:10Z

I’ll write that comment, but I also noticed I had accidentally named the directory under validation-test in plural - benchmarks instead of benchmark (that matches the one under swift). So I’ll push that rename in the same commit, OK?

…ks -> benchmark].

palimondo · 2017-04-13T19:53:22Z

@gottesmm I looked at the console log of the benchmark in progress, and I don’t see it running any of my new tests… am I missing something?

swift-ci · 2017-04-13T23:10:20Z

Build comment file:

Optimized (O)

gottesmm · 2017-04-14T02:14:48Z

@palimondo I think it is b/c it is doing a before/after comparison. If you look at the first group of runs, it is there. Can you add the changes that I requested and then I will do a smoke test and merge

palimondo · 2017-04-14T05:50:26Z

@gottesmm I think we’re ready to smoke and merge.

gottesmm · 2017-04-14T07:10:55Z

@swift-ci smoke test and merge

gottesmm · 2017-04-14T07:11:08Z

@swift-ci smoke test and merge

palimondo · 2017-04-14T08:13:50Z

Thank you @gottesmm, @lplarson, @dabrahams !

palimondo · 2017-05-04T22:20:54Z

@eeckstein Re:

Yeah, CheckResults should not be inside the for 1...N loop.

I've patterned my additions after #7420 Benchmarks for dropLast and suffix by @therealbnut
They check the correctness of the operation, too. I found it quite useful when crafting variations on the original tests. Some of my sequence versions were buggy (of by iteration) on first attempt.

I'm not sure how hoisting them out of that loop will check the result - for different Ns. I think as long as the overhead between comparisons of the same method on different Sequences is the same, the test still works.

eeckstein · 2017-05-04T23:30:30Z

@palimondo

I'm not sure how hoisting them out of that loop will check the result - for different Ns.

You could just add the results for all iterations and compare against the total sum after the loop. I think the chance that this will miss a "compensated" wrong computation is quite low.

therealbnut · 2017-05-05T01:03:53Z

@eeckstein I agree as the error tends to zero as N→∞, and the cost of CheckResults is negligible.

However, it's a pattern that could easily cause mistakes. This would not work if the cost of CheckResults(...) is non-linear with the inner loop (n), so the cost grows quickly as N increases. For example checking a binary tree with CheckResults being O(n log(n)) has a large error if you use n*N rather than just n.

Anyway, I think in an ideal world we would do something that complete removes the verification when calculating times (out of scope of this PR):

func CheckResults(_ test: @autoclosure () -> Bool, _ message: String) {
   guard VERIFY else { return }
   assert(test(), message)
}

Run like this:

run-benchmarks -N=1 -verify=true
run-benchmarks -N=10 -verify=false

The documentation states that N=1 should take a minimum amount of time (to minimise the impact of error like CheckResults), it would be nice to assert this in the first run as well.

eeckstein · 2017-05-05T04:26:55Z

A closure is bad because it involves memory allocation which is much more expensive than the benchmark body itself for some benchmarks (That's what @dabrahams found).

palimondo · 2017-05-05T04:50:31Z

I'm not so sure that's the only/correct way to interpret it… see discussion in #9298.

therealbnut · 2017-05-05T13:49:15Z

Note that the benchmark body is meant to take a minimum amount of time to run, from the README:

The benchmark driver will measure the time taken for N = 1 and automatically calculate
the necessary number of iterations N to run each benchmark in approximately one second,
so the test should ideally run in a few milliseconds for N = 1. If the test contains
any setup code before the loop, ensure the time spent on setup is insignificant compared to
the time spent inside the loop (for N = 1) -- otherwise the automatic calculation of N might be
significantly off and any performance gains/regressions will be masked by the fixed setup time.

I believe the intent is that the time of the setup/teardown/verification should be negligible by comparison to the work being measured. So if the time for a closure is significant then the body should probably be iterated enough times that it isn't.

palimondo added 4 commits April 8, 2017 09:13

Add support for GYB in benchmarks

4f3753a

Fixed filename in the comments.

4dbcec2

Fixed more filenames in the comments.

11065be

Validate that benchmark commits contain freshly generated test harness

7c103d8

palimondo changed the title ~~[WIP] Add support for GYB in benchmarks~~ Add support for GYB in benchmarks Apr 10, 2017

Add same test variants for DropLast and Prefix as exist on Suffix

739c3b7

dabrahams reviewed Apr 11, 2017

View reviewed changes

Performance tests for Sequence methods: DropFirst, DropWhile, PrefixW…

1c9cc38

…hile

Added %swift_src_root substitution for lit tests.

e9995fa

palimondo added 2 commits April 13, 2017 12:41

Nit: Fixed formatting of dashes in file header

54cfe0a

Fixed Python style issues reported by python-lint

6a54bc2

Added usage comment to test driver. Renamed validation-test/[benchmar…

857745f

…ks -> benchmark].

swift-ci merged commit bf08d01 into swiftlang:master Apr 14, 2017

palimondo deleted the sequence-benchmarks branch April 14, 2017 08:12

palimondo mentioned this pull request May 2, 2017

[benchmark] SR-4572 Remove jinja2 dependency from test harness generation #9193

Merged

palimondo mentioned this pull request May 5, 2017

Remove interpolated strings from benchmark CheckResults #9298

Merged

palimondo mentioned this pull request May 13, 2017

Fix a few stdlib coding convention violations #9568

Merged

palimondo mentioned this pull request May 31, 2017

Document the added lit substitution swift_src_root #10000

Merged

palimondo mentioned this pull request Jul 11, 2018

[benchmark] Restore running benchmarks by ordinal numbers and related bugfixes #12415

Merged

palimondo mentioned this pull request Nov 6, 2018

[benchmark] Naming Convention #20334

Merged

This was referenced Apr 8, 2017

[SR-4533] Using gyb with benchmarks #47110

Closed

[SR-4534] Validate that benchmark commits contain freshly generated test harness #47111

Closed

palimondo mannequin mentioned this pull request Apr 25, 2022

[SR-4572] Remove jinja2 dependency from benchmark test harness generation #47149

Closed

Add support for GYB in benchmarks #8641

Add support for GYB in benchmarks #8641

Uh oh!

Conversation

palimondo commented Apr 8, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

palimondo commented Apr 10, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

palimondo Apr 11, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

palimondo commented Apr 11, 2017

Uh oh!

lplarson commented Apr 11, 2017

Uh oh!

lplarson commented Apr 11, 2017

Uh oh!

palimondo commented Apr 12, 2017

Uh oh!

lplarson commented Apr 12, 2017

Uh oh!

lplarson commented Apr 12, 2017

Uh oh!

swift-ci commented Apr 12, 2017

Build comment file:

Uh oh!

palimondo commented Apr 12, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

palimondo commented Apr 12, 2017

Uh oh!

lplarson commented Apr 12, 2017

Uh oh!

lplarson commented Apr 12, 2017

Uh oh!

lplarson commented Apr 12, 2017

Uh oh!

swift-ci commented Apr 12, 2017

Uh oh!

swift-ci commented Apr 12, 2017

Uh oh!

lplarson commented Apr 12, 2017

Uh oh!

swift-ci commented Apr 13, 2017

Build comment file:

Uh oh!

palimondo commented Apr 13, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gottesmm commented Apr 13, 2017

Uh oh!

gottesmm commented Apr 13, 2017

Uh oh!

gottesmm commented Apr 13, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gottesmm commented Apr 13, 2017

Uh oh!

gottesmm commented Apr 13, 2017

Uh oh!

palimondo commented Apr 8, 2017 •

edited

Loading

palimondo Apr 11, 2017 •

edited

Loading

palimondo commented Apr 12, 2017 •

edited

Loading

palimondo commented Apr 13, 2017 •

edited

Loading

gottesmm commented Apr 13, 2017 •

edited

Loading

palimondo commented Apr 13, 2017 •

edited

Loading

palimondo commented May 4, 2017 •

edited

Loading

therealbnut commented May 5, 2017 •

edited

Loading

therealbnut commented May 5, 2017 •

edited

Loading