-
Notifications
You must be signed in to change notification settings - Fork 10.5k
[benchmark] Restore running benchmarks by ordinal numbers and related bugfixes #12415
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
b7eb476
7438ffd
4c4c6a2
8b03980
d40ddab
cf1d78b
2d00497
d82c996
7f89426
c288f75
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -86,22 +86,38 @@ Using the Benchmark Driver | |
* `--num-samples` | ||
* Control the number of samples to take for each test | ||
* `--list` | ||
* Print a list of available tests | ||
* Print a list of available tests matching specified criteria | ||
* `--tags` | ||
* Run tests that are labeled with specified [tags](https://github.com/apple/swift/blob/master/benchmark/utils/TestsUtils.swift#L19) | ||
(comma separated list); multiple tags are interpreted as logical AND, i.e. | ||
run only test that are labeled with all the supplied tags | ||
* `--skip-tags` | ||
* Don't run tests that are labeled with any of the specified tags (comma | ||
separated list); default value: `skip,unstable`; to get complete list of | ||
tests, specify empty `--skip-tags=` | ||
|
||
|
||
### Examples | ||
|
||
* `$ ./Benchmark_O --num-iters=1 --num-samples=1` | ||
* `$ ./Benchmark_Onone --list` | ||
* `$ ./Benchmark_Osize Ackermann` | ||
* `$ ./Benchmark_O --tags=Dictionary` | ||
* `$ ./Benchmark_O --skip-tags=unstable,skip,validation` | ||
|
||
### Note | ||
As a shortcut, you can also refer to benchmarks by their ordinal numbers. | ||
The regular `--list` option does not provide these, but you can run: | ||
* `$ ./Benchmark_O --list --run-all | tail -n +2 | nl` | ||
You can use ordinal numbers instead of test names like this: | ||
These are printed out together with benchmark names and tags using the | ||
`--list` parameter. For a complete list of all available performance tests run | ||
* `$ ./Benchmark_O --list --skip-tags=` | ||
|
||
You can use test numbers instead of test names like this: | ||
* `$ ./Benchmark_O 1 42` | ||
* `$ ./Benchmark_Driver run 1 42` | ||
|
||
Test numbers are not stable in the long run, adding and removing tests from the | ||
benchmark suite will reorder them, but they are stable for a given build. | ||
|
||
Using the Harness Generator | ||
--------------------------- | ||
|
||
|
@@ -186,3 +202,21 @@ public func run_YourTestName(N: Int) { | |
|
||
The current set of tags are defined by the `BenchmarkCategory` enum in | ||
`TestsUtils.swift` . | ||
|
||
Testing the Benchmark Drivers | ||
----------------------------- | ||
When working on tests, after the initial build | ||
```` | ||
swift-source$ ./swift/utils/build-script -R -B | ||
```` | ||
you can rebuild just the benchmarks: | ||
```` | ||
swift-source$ export SWIFT_BUILD_DIR=`pwd`/build/Ninja-ReleaseAssert/swift-macosx-x86_64 | ||
swift-source$ ninja -C ${SWIFT_BUILD_DIR} swift-benchmark-macosx-x86_64 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You don't need to export SWIFT_BUILD_DIR. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It illustrates what it should be, because it isn't obvious, as you start learning about Swift's build process. I had to learn it the hard way, so I've included it in documentation. You'll rewrite this whole file soon enough, so it ultimately doesn't matter. Take it as feedback for what to include in the rewrite. |
||
```` | ||
|
||
When modifying the testing infrastructure, you should verify that your changes | ||
pass all the tests: | ||
```` | ||
swift-source$ ./llvm/utils/lit/lit.py -sv ${SWIFT_BUILD_DIR}/test-macosx-x86_64/benchmark | ||
```` |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -118,28 +118,32 @@ def instrument_test(driver_path, test, num_samples): | |
return avg_test_output | ||
|
||
|
||
BENCHMARK_OUTPUT_RE = re.compile('([^,]+),') | ||
|
||
|
||
def get_tests(driver_path, args): | ||
"""Return a list of available performance tests""" | ||
driver = ([driver_path, '--list']) | ||
# Use tab delimiter for easier parsing to override the default comma. | ||
# (The third 'column' is always comma-separated list of tags in square | ||
# brackets -- currently unused here.) | ||
driver.append('--delim=\t') | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why is this in this change? |
||
if args.benchmarks or args.filters: | ||
driver.append('--run-all') | ||
tests = [] | ||
for l in subprocess.check_output(driver).split("\n")[1:]: | ||
m = BENCHMARK_OUTPUT_RE.match(l) | ||
if m is None: | ||
continue | ||
tests.append(m.group(1)) | ||
driver.append('--skip-tags=') # list all tests, don't skip any tags | ||
index_name_pairs = [ | ||
line.split('\t')[:2] for line in | ||
subprocess.check_output(driver).split('\n')[1:-1] | ||
] | ||
indices, names = zip(*index_name_pairs) # unzip list of pairs into 2 lists | ||
if args.filters: | ||
regexes = [re.compile(pattern) for pattern in args.filters] | ||
return sorted(list(set([name for pattern in regexes | ||
for name in tests if pattern.match(name)]))) | ||
for name in names if pattern.match(name)]))) | ||
if not args.benchmarks: | ||
return tests | ||
tests.extend(map(str, range(1, len(tests) + 1))) # ordinal numbers | ||
return sorted(list(set(tests).intersection(set(args.benchmarks)))) | ||
return names | ||
benchmarks = set(args.benchmarks) | ||
index_to_name = dict(index_name_pairs) | ||
indexed_names = [index_to_name[i] | ||
for i in benchmarks.intersection(set(indices))] | ||
return sorted(list( | ||
benchmarks.intersection(set(names)).union(indexed_names))) | ||
|
||
|
||
def get_current_git_branch(git_repo_path): | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -69,7 +69,7 @@ struct Test { | |
|
||
/// The benchmark categories that this test belongs to. Used for filtering. | ||
var tags: [BenchmarkCategory] { | ||
return benchInfo.tags | ||
return benchInfo.tags.sorted() | ||
} | ||
|
||
/// An optional initialization function for a benchmark that is run before | ||
|
@@ -181,7 +181,7 @@ struct TestConfig { | |
|
||
// We support specifying multiple tags by splitting on comma, i.e.: | ||
// | ||
// --tags=array,set | ||
// --tags=Array,Dictionary | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Instead of doing this, can you rename the categories to Array so that they are lower case. We should match swift style here. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not sure I follow. The categories seem logical to me at the moment: Capitals for classes, lowercase for descriptive categories. Here I have only updated the comment to correctly reflect the current categories. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would rather you just in a separate commit, just change the names of Array, Dictionary rather than update the comments. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. To be clear, the reason why I am against using caps here is that this is against the swift style guide. All enum cases should be camelCased. Why should the benchmark suite do something different from /all/ other swift projects. We should be standard here. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree to @palimondo here: The categories which refer to concrete types in the stdlib should be named exactly as those types, everything else with lowercase. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ok. I am fine with dropping this. But just to be clear, I am not saying that the "user-visible" name shouldn't be capitalized. You would do it like this:
I am just saying that IMO all code in the swift compiler should follow the swift style guide closely because our swift code is often times read as an example code by external developers. Whenever we have diverged in the past, our users have read the code and gotten confused. We are inviting that confusion here. |
||
// | ||
// FIXME: If we used Error instead of .fail, then we could have a cleaner | ||
// impl here using map on x and tags.formUnion. | ||
|
@@ -200,7 +200,7 @@ struct TestConfig { | |
|
||
// We support specifying multiple tags by splitting on comma, i.e.: | ||
// | ||
// --skip-tags=array,set | ||
// --skip-tags=Array,Set,unstable,skip | ||
// | ||
// FIXME: If we used Error instead of .fail, then we could have a cleaner | ||
// impl here using map on x and tags.formUnion. | ||
|
@@ -227,39 +227,22 @@ struct TestConfig { | |
} | ||
|
||
mutating func findTestsToRun() { | ||
let benchmarkNameFilter = Set(filters) | ||
|
||
// t is needed so we don't capture an ivar of a mutable inout self. | ||
let t = tags | ||
let st = skipTags | ||
let filteredTests = Array(registeredBenchmarks.filter { benchInfo in | ||
if !t.isSubset(of: benchInfo.tags) { | ||
return false | ||
} | ||
|
||
if !st.isDisjoint(with: benchInfo.tags) { | ||
return false | ||
} | ||
|
||
// If the user did not specified a benchmark name filter and our tags are | ||
// a subset of the specified tags by the user, return true. We want to run | ||
// this test. | ||
if benchmarkNameFilter.isEmpty { | ||
return true | ||
registeredBenchmarks.sort() | ||
let indices = Dictionary(uniqueKeysWithValues: | ||
zip(registeredBenchmarks.map{$0.name}, 1...)) | ||
let benchmarkNamesOrIndices = Set(filters) | ||
// needed so we don't capture an ivar of a mutable inout self. | ||
let (_tags, _skipTags) = (tags, skipTags) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do other projects use underscore for this purpose? Why not just do:
Also, please capitalize comment sentences. |
||
|
||
tests = registeredBenchmarks.filter { benchmark in | ||
if benchmarkNamesOrIndices.isEmpty { | ||
return benchmark.tags.isSuperset(of: _tags) && | ||
benchmark.tags.isDisjoint(with: _skipTags) | ||
} else { | ||
return benchmarkNamesOrIndices.contains(benchmark.name) || | ||
benchmarkNamesOrIndices.contains(String(indices[benchmark.name]!)) | ||
} | ||
|
||
// Otherwise, we need to check if our benchInfo's name is in the benchmark | ||
// name filter list. If it isn't, then we shouldn't process it. | ||
return benchmarkNameFilter.contains(benchInfo.name) | ||
}).sorted() | ||
|
||
if (filteredTests.isEmpty) { | ||
return | ||
} | ||
|
||
tests = filteredTests.enumerated().map { | ||
Test(benchInfo: $0.element, index: $0.offset + 1) | ||
} | ||
}.map { Test(benchInfo: $0, index: indices[$0.name]!) } | ||
} | ||
} | ||
|
||
|
@@ -382,14 +365,13 @@ func runBench(_ test: Test, _ c: TestConfig) -> BenchResults? { | |
|
||
let sampler = SampleRunner() | ||
for s in 0..<c.numSamples { | ||
test.setUpFunction?() | ||
let time_per_sample: UInt64 = 1_000_000_000 * UInt64(c.iterationScale) | ||
|
||
var scale : UInt | ||
var elapsed_time : UInt64 = 0 | ||
if c.fixedNumIters == 0 { | ||
test.setUpFunction?() | ||
elapsed_time = sampler.run(test.name, fn: testFn, num_iters: 1) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why are you changing this in this commit? This has nothing to do with the change you are trying to make. It shouldn't be in this PR. |
||
test.tearDownFunction?() | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please do not move this out of the loop. It is important that we have setup/teardown before/after each invocation of the benchmarking function. Benchmark writers should be able to assume that they will have a pristine state when the benchmark starts running. Otherwise, you could have one benchmark iteration affecting another benchmark iteration. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I believe this refactoring is safe, because multiple iterations are handled by passing the scale parameter to the Even if you're thinking about a case which measures some kind of a mutable operation (say removing elements from collection), the That's why I think that measuring multiple samples with single There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree, the setUp/tearDown should be inside the loop There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @eeckstein The timing of our comments look simultaneous. Did you read my explanation? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'll move |
||
|
||
if elapsed_time > 0 { | ||
scale = UInt(time_per_sample / elapsed_time) | ||
|
@@ -402,6 +384,9 @@ func runBench(_ test: Test, _ c: TestConfig) -> BenchResults? { | |
} else { | ||
// Compute the scaling factor if a fixed c.fixedNumIters is not specified. | ||
scale = c.fixedNumIters | ||
if scale == 1 { | ||
elapsed_time = sampler.run(test.name, fn: testFn, num_iters: 1) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why is this in this PR? This has nothing to do with the title. |
||
} | ||
} | ||
// Make integer overflow less likely on platforms where Int is 32 bits wide. | ||
// FIXME: Switch BenchmarkInfo to use Int64 for the iteration scale, or fix | ||
|
@@ -413,9 +398,7 @@ func runBench(_ test: Test, _ c: TestConfig) -> BenchResults? { | |
if c.verbose { | ||
print(" Measuring with scale \(scale).") | ||
} | ||
test.setUpFunction?() | ||
elapsed_time = sampler.run(test.name, fn: testFn, num_iters: scale) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same here. Why are you changing this in this PR. It should be a separate PR with reasoning. |
||
test.tearDownFunction?() | ||
} else { | ||
scale = 1 | ||
} | ||
|
@@ -424,6 +407,7 @@ func runBench(_ test: Test, _ c: TestConfig) -> BenchResults? { | |
if c.verbose { | ||
print(" Sample \(s),\(samples[s])") | ||
} | ||
test.tearDownFunction?() | ||
} | ||
|
||
let (mean, sd) = internalMeanSD(samples) | ||
|
@@ -497,9 +481,9 @@ public func main() { | |
fatalError("\(msg)") | ||
case .listTests: | ||
config.findTestsToRun() | ||
print("Enabled Tests\(config.delim)Tags") | ||
print("#\(config.delim)Test\(config.delim)[Tags]") | ||
for t in config.tests { | ||
print("\(t.name)\(config.delim)\(t.tags)") | ||
print("\(t.index)\(config.delim)\(t.name)\(config.delim)\(t.tags)") | ||
} | ||
case .run: | ||
config.findTestsToRun() | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -70,6 +70,18 @@ public enum BenchmarkCategory : String { | |
case skip | ||
} | ||
|
||
extension BenchmarkCategory : CustomStringConvertible { | ||
public var description: String { | ||
return self.rawValue | ||
} | ||
} | ||
|
||
extension BenchmarkCategory : Comparable { | ||
public static func < (lhs: BenchmarkCategory, rhs: BenchmarkCategory) -> Bool { | ||
return lhs.rawValue < rhs.rawValue | ||
} | ||
} | ||
|
||
public struct BenchmarkPlatformSet : OptionSet { | ||
public let rawValue: Int | ||
|
||
|
@@ -111,7 +123,7 @@ public struct BenchmarkInfo { | |
/// A set of category tags that describe this benchmark. This is used by the | ||
/// harness to allow for easy slicing of the set of benchmarks along tag | ||
/// boundaries, e.x.: run all string benchmarks or ref count benchmarks, etc. | ||
public var tags: [BenchmarkCategory] | ||
public var tags: Set<BenchmarkCategory> | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 👍 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Instead of using a Set, please us an OptionSet. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not sure this can work, There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sure. |
||
|
||
/// The platforms that this benchmark supports. This is an OptionSet. | ||
private var unsupportedPlatforms: BenchmarkPlatformSet | ||
|
@@ -146,7 +158,7 @@ public struct BenchmarkInfo { | |
unsupportedPlatforms: BenchmarkPlatformSet = []) { | ||
self.name = name | ||
self._runFunction = runFunction | ||
self.tags = tags | ||
self.tags = Set(tags) | ||
self._setUpFunction = setUpFunction | ||
self._tearDownFunction = tearDownFunction | ||
self.unsupportedPlatforms = unsupportedPlatforms | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
// REQUIRES: OS=macosx | ||
// REQUIRES: asserts | ||
// REQUIRES: benchmark | ||
// REQUIRES: CMAKE_GENERATOR=Ninja | ||
|
||
// Integration tests between Benchmark_Driver and Benchmark_O | ||
// TODO: Keep the "run just once" check and move the rest into unit tests for | ||
// Benchmark_Driver, as they are redundant and unnecessarily slow. | ||
|
||
// RUN: %Benchmark_Driver run Ackermann | %FileCheck %s --check-prefix RUNNAMED | ||
// RUNNAMED: Ackermann | ||
|
||
// RUN: %Benchmark_Driver run 1 | %FileCheck %s --check-prefix RUNBYNUMBER | ||
// RUNBYNUMBER: Ackermann | ||
|
||
// RUN: %Benchmark_Driver run 1 Ackermann 1 \ | ||
// RUN: | %FileCheck %s --check-prefix RUNJUSTONCE | ||
// RUNJUSTONCE-LABEL: Ackermann | ||
// RUNJUSTONCE-NOT: Ackermann | ||
|
||
// RUN: %Benchmark_Driver run -f Acker | %FileCheck %s --check-prefix RUNFILTER | ||
// RUNFILTER: Ackermann |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,50 @@ | ||
// REQUIRES: OS=macosx | ||
// REQUIRES: asserts | ||
// REQUIRES: benchmark | ||
// REQUIRES: CMAKE_GENERATOR=Ninja | ||
|
||
// RUN: %Benchmark_O --list | %FileCheck %s --check-prefix LISTTAGS | ||
// LISTTAGS: AngryPhonebook,[ | ||
// LISTTAGS-NOT: TestsUtils.BenchmarkCategory. | ||
// LISTTAGS-SAME: String, | ||
// LISTTAGS-SAME: ] | ||
|
||
// RUN: %Benchmark_O AngryPhonebook --num-iters=1 \ | ||
// RUN: | %FileCheck %s --check-prefix NUMITERS1 | ||
// NUMITERS1: AngryPhonebook,1 | ||
// NUMITERS1-NOT: 0,0,0,0,0 | ||
|
||
// Should run benchmark by name, even if its tags match the default skip-tags | ||
// (unstable,skip). Ackermann is marked unstable | ||
// RUN: %Benchmark_O Ackermann | %FileCheck %s --check-prefix NAMEDSKIP | ||
// NAMEDSKIP: Ackermann | ||
|
||
// RUN: %Benchmark_O --list --tags=Dictionary,Array \ | ||
// RUN: | %FileCheck %s --check-prefix ANDTAGS | ||
// ANDTAGS: TwoSum | ||
// ANDTAGS-NOT: Array2D | ||
// ANDTAGS-NOT: DictionarySwap | ||
|
||
// RUN: %Benchmark_O --list --tags=algorithm --skip-tags=validation \ | ||
// RUN: | %FileCheck %s --check-prefix TAGSANDSKIPTAGS | ||
// TAGSANDSKIPTAGS: Ackermann | ||
// TAGSANDSKIPTAGS: DictOfArraysToArrayOfDicts | ||
// TAGSANDSKIPTAGS: Fibonacci | ||
// TAGSANDSKIPTAGS: RomanNumbers | ||
|
||
// RUN: %Benchmark_O --list --tags=algorithm \ | ||
// RUN: --skip-tags=validation,Dictionary,String \ | ||
// RUN: | %FileCheck %s --check-prefix ORSKIPTAGS | ||
// ORSKIPTAGS: Ackermann | ||
// ORSKIPTAGS-NOT: DictOfArraysToArrayOfDicts | ||
// ORSKIPTAGS: Fibonacci | ||
// ORSKIPTAGS-NOT: RomanNumbers | ||
|
||
// RUN: %Benchmark_O --list | %FileCheck %s --check-prefix LISTPRECOMMIT | ||
// LISTPRECOMMIT: #,Test,[Tags] | ||
// LISTPRECOMMIT-NOT: Ackermann | ||
// LISTPRECOMMIT: {{[0-9]+}},AngryPhonebook | ||
|
||
// RUN: %Benchmark_O --list --skip-tags= | %FileCheck %s --check-prefix LISTALL | ||
// LISTALL: Ackermann | ||
// LISTALL: AngryPhonebook |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -88,6 +88,9 @@ config.available_features.add("CMAKE_GENERATOR=@CMAKE_GENERATOR@") | |
if "@SWIFT_ENABLE_SOURCEKIT_TESTS@" == "TRUE": | ||
config.available_features.add('sourcekit') | ||
|
||
if "@SWIFT_BUILD_PERF_TESTSUITE@" == "TRUE": | ||
config.available_features.add('benchmark') | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hm. This broke my local testing, where I usually configure the benchmarks to build (so I have the option) but don't actually always build them. Since There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm sorry for that! But I'm not sure how to properly fix it. I've added this condition following explicit guidance from @eeckstein. Any pointers for analogous dynamic check? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hm, well, you're already using There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm a bit tick today… so let me walk through this slowly, if I get your meaning: There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, basically. I actually think you should drop the current check and instead put one in the regular |
||
|
||
if "@SWIFT_ENABLE_GUARANTEED_NORMAL_ARGUMENTS@" == "TRUE": | ||
config.available_features.add('plus_zero_runtime') | ||
else: | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this in this PR? You should be restoring the ordinal number thing...