[benchmark] Added Benchmark Check Report #20807

palimondo · 2018-11-27T22:08:22Z

Integrates the Benchmark_Driver check command into run_smoke_bench, producing Markdown formatted report table, analyzing the quality of newly added benchmarks.

Example output:

✅	Benchmark Check Report
⛔️🔤	`String.insert.EmojiChar.NearEnd` name doesn`t conform to UpperCamelCase convention. _{See http://bit.ly/UpperCamelCase}
⚠️⏱	`String.insert.EmojiChar.NearEnd` execution took at least 1871 μs. _{Decrease the workload of String.insert.EmojiChar.NearEnd by a factor of 2 (10), to be less than 1000 μs.}
⛔️🔤	`String.insert.EmojiChar.EndIndex` name doesn`t conform to UpperCamelCase convention. _{See http://bit.ly/UpperCamelCase}
⚠️Ⓜ️	`String.insert.EmojiChar.EndIndex` has very wide range of memory used between independent, repeated measurements. _{String.insert.EmojiChar.EndIndex mem_pages [i1, i2]: min=[12, 22] 𝚫=10 R=[15, 18]}

If there are no issue with added benchmarks, the result looks like this:

✅	Benchmark Check Report

Promoting previously DEBUG message to INFO.

`logging.Handler` that creates nicely formatted report from `BecnhmarkDoctor`’s `check` in Markdown table for display on GitHub.

Produce Markdown formatted report, analyzing the quality of newly added benchmarks.

palimondo · 2018-11-27T22:27:26Z

This is the reason why I'd like to finish and merge the #20334 [benchmark] Naming Convention
cc @atrick @lorentey @airspeedswift @jrose-apple

eeckstein

Basically LGTM! (Please see my comment inline)

Also, can you add a temporary test-commit to this PR which introduces some new "bad" benchmarks, which violate several rules? So that we can test this change with "swift-ci benchmark".

benchmark/scripts/run_smoke_bench

Removed the option to `-check-added`, as it’s now run by default. Replaced with option to skip checking added benchmarks: `-skip-check-added`.

palimondo · 2018-11-28T21:00:15Z

@eeckstein I've added a -skip-check-added option. We can test with #20837, that renames InsertCharacter benchmarks to String.insert applying the proposed naming convention, because current naming rule will complain the new names don't conform to UpperCamelCase.

benchmark/scripts/run_smoke_bench

Simplify the code by removing indirection: After removal of -check-added option, only single function remains.

palimondo · 2018-11-28T21:48:35Z

@eeckstein Removed the indirection, test_opt_levels is now called directly.

palimondo · 2018-11-28T22:23:15Z

BTW, it looks like swift-ci doesn’t support combining PRs for benchmark the way it does(?) for test. We tried running Karoy’s new Breadcrumbs with this one, but I don’t think it worked.

eeckstein · 2018-11-28T22:33:36Z

Can you just temporarily add a test-commit to this PR, and then after the benchmark run, remove it again?

eeckstein · 2018-11-28T22:40:44Z

@swift-ci benchmark

eeckstein · 2018-11-28T22:50:18Z

@swift-ci benchmark

eeckstein · 2018-11-28T22:51:06Z

@swift-ci benchmark

palimondo · 2018-11-28T22:52:29Z

I've noticed the MarkdownReportHandler always prints the header, which is useful when there are no validation issues, but it did so even when there were no added benchmarks. a6f26e2 works around that by not adding the handler when added set is empty. @eeckstein Do you want to re-run benchmarks again with that change, too?

palimondo · 2018-11-28T23:02:08Z

I hope the CI bot's managed to clone before I've force-pushed 🙃
Assuming all goes well, I've removed the DNM commit and if you agree the cd47f32 is trivial, once all passes test this can land while I take my 💤s… (it's midnight here) 😴

swift-ci · 2018-11-28T23:34:58Z

Build comment file:

Performance: -O

TEST	MIN	MAX	MEAN	MAX_RSS
Added
String.insert.ASCIIChar.EndIndex	156	158	157	—
String.insert.ASCIIChar.NearEnd	199	204	201	—
String.insert.ASCIIChar.StartIndex	651	651	651	—
String.insert.EmojiChar.EndIndex	60	74	65	—
String.insert.EmojiChar.NearEnd	224	226	225	—
String.insert.EmojiChar.StartIndex	365	365	365	—
Removed
InsertCharacterEndIndex	155	159	156	—
InsertCharacterEndIndexNonASCII	58	69	62	—
InsertCharacterStartIndex	649	690	663	—
InsertCharacterStartIndexNonASCII	360	367	362	—
InsertCharacterTowardsEndIndex	200	204	201	—
InsertCharacterTowardsEndIndexNonASCII	219	223	221	—

Performance: -Osize

TEST	MIN	MAX	MEAN	MAX_RSS
Added
String.insert.ASCIIChar.EndIndex	216	221	218	—
String.insert.ASCIIChar.NearEnd	214	218	216	—
String.insert.ASCIIChar.StartIndex	655	655	655	—
String.insert.EmojiChar.EndIndex	57	72	62	—
String.insert.EmojiChar.NearEnd	242	247	244	—
String.insert.EmojiChar.StartIndex	364	364	364	—
Removed
InsertCharacterEndIndex	163	169	165	—
InsertCharacterEndIndexNonASCII	56	60	58	—
InsertCharacterStartIndex	652	652	652	—
InsertCharacterStartIndexNonASCII	363	363	363	—
InsertCharacterTowardsEndIndex	217	224	219	—
InsertCharacterTowardsEndIndexNonASCII	225	227	226	—

Performance: -Onone

TEST	MIN	MAX	MEAN	MAX_RSS
Added
String.insert.ASCIIChar.EndIndex	222	224	223	—
String.insert.ASCIIChar.NearEnd	242	245	243	—
String.insert.ASCIIChar.StartIndex	753	754	753	—
String.insert.EmojiChar.EndIndex	80	100	87	—
String.insert.EmojiChar.NearEnd	235	239	236	—
String.insert.EmojiChar.StartIndex	406	407	407	—
Removed
InsertCharacterEndIndex	222	224	223	—
InsertCharacterEndIndexNonASCII	80	82	81	—
InsertCharacterStartIndex	739	739	739	—
InsertCharacterStartIndexNonASCII	404	405	405	—
InsertCharacterTowardsEndIndex	238	240	239	—
InsertCharacterTowardsEndIndexNonASCII	234	238	236	—

✅	Benchmark Check Report
⛔️🔤	`String.insert.EmojiChar.NearEnd` name doesn`t conform to UpperCamelCase convention. _{See http://bit.ly/UpperCamelCase}
⚠️Ⓜ️	`String.insert.EmojiChar.NearEnd` has very wide range of memory used between independent, repeated measurements. _{String.insert.EmojiChar.NearEnd mem_pages [i1, i2]: min=[11, 12] 𝚫=1 R=[0, 59]}
⛔️🔤	`String.insert.EmojiChar.EndIndex` name doesn`t conform to UpperCamelCase convention. _{See http://bit.ly/UpperCamelCase}
⚠️Ⓜ️	`String.insert.EmojiChar.EndIndex` has very wide range of memory used between independent, repeated measurements. _{String.insert.EmojiChar.EndIndex mem_pages [i1, i2]: min=[11, 12] 𝚫=1 R=[37, 29]}
⛔️🔤	`String.insert.ASCIIChar.StartIndex` name doesn`t conform to UpperCamelCase convention. _{See http://bit.ly/UpperCamelCase}
⚠️Ⓜ️	`String.insert.ASCIIChar.StartIndex` has very wide range of memory used between independent, repeated measurements. _{String.insert.ASCIIChar.StartIndex mem_pages [i1, i2]: min=[11, 11] 𝚫=0 R=[26, 9]}
⛔️🔤	`String.insert.ASCIIChar.EndIndex` name doesn`t conform to UpperCamelCase convention. _{See http://bit.ly/UpperCamelCase}
⛔️🔤	`String.insert.ASCIIChar.NearEnd` name doesn`t conform to UpperCamelCase convention. _{See http://bit.ly/UpperCamelCase}
⚠️Ⓜ️	`String.insert.ASCIIChar.NearEnd` has very wide range of memory used between independent, repeated measurements. _{String.insert.ASCIIChar.NearEnd mem_pages [i1, i2]: min=[11, 11] 𝚫=0 R=[9, 26]}
⛔️🔤	`String.insert.EmojiChar.StartIndex` name doesn`t conform to UpperCamelCase convention. _{See http://bit.ly/UpperCamelCase}

How to read the data

The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview

  Model Name: Mac Pro
  Model Identifier: MacPro6,1
  Processor Name: 12-Core Intel Xeon E5
  Processor Speed: 2.7 GHz
  Number of Processors: 1
  Total Number of Cores: 12
  L2 Cache (per Core): 256 KB
  L3 Cache: 30 MB
  Memory: 64 GB

--------------

eeckstein · 2018-11-28T23:36:45Z

@swift-ci smoke test

eeckstein · 2018-11-28T23:37:10Z

@swift-ci smoke test

eeckstein · 2018-11-28T23:37:23Z

@swift-ci benchmark

swift-ci · 2018-11-29T00:23:24Z

Build comment file:

No performance and code size changes

palimondo · 2018-11-29T06:06:16Z

@milseman Can you help me understand the unusual memory variance seen in the demonstration? Similar variance issues manifested themselves over in Breadcrumbs. Without knowing any details about their implementation, I'd guess there's some kind of memory leak in the new Strings? Granted, "constant memory use" rule is quite unproven recommendation at this time: I asked for advice in the forums before…

milseman · 2018-11-29T18:50:51Z

That's weird, let's make sure it's not a formal memory leak. Which one seems to be the biggest offender?

palimondo · 2018-11-29T19:15:54Z

Nothing stands out especially, i.e. all look very strange. Let me explain the metric some more: String.insert.EmojiChar.NearEnd mem_pages [i1, i2]: min=[11, 12] 𝚫=1 R=[0, 59]

This means: i1 (measured with --num-iters=1), i2 means --num-iters=2. There were 10 independent runs, 5 i1, 5 i2. The lowest memory used was 11 pages (11 * 4096B = 44kB) and it was like that for all the runs, because the first R (range) is 0. But for i2, the second R means the highest memory consumption seen was 12 + 59 = 71 pages (284 kB).

Looking at all of the above ⚠️Ⓜ️ gives me very 🐟y vibes...

milseman · 2018-11-29T19:38:30Z

This might be because those benchmarks don't reset their state after every iteration:

@inline(__always)
func insertTowardsEndIndex(_ c: Character, in string: String, count: Int) {
  var workload = string
  var index = workload.endIndex
  for i in 0..<count {
    workload.insert(identity(c), at: index)
    if i % 1000 == 0 {
      index = workload.endIndex
    }
  }
  blackHole(workload)
}
@inline(never)
func run_InsertCharacterTowardsEndIndex(_ N: Int) {
  insertTowardsEndIndex("s", in: str, count: N * 3000)
}

It seems like run_InsertCharacterTowardsEndIndex should have an outer loop from 0..<N and then run insertTowardsEndIndex with a count of exactly 3000. That would give constant size and linear perf, instead of what I assume is currently linear size and quadratic perf.

palimondo · 2018-11-29T21:53:11Z

@milseman That's actually good to hear, we can fix that!

And looking at new Breadcrumbs tests, would you say the variance is normal, or is these some kind of issue, too? These are my local results, where the heuristic detected "very wide range of memory used between independent, repeated measurements":

'Breadcrumbs.CopyUTF16CodeUnits.ASCII' mem_pages [i1, i2]: min=[22, 23] 𝚫=1 R=[19, 18]
'Breadcrumbs.IdxToUTF16.longMixed' mem_pages [i1, i2]: min=[300, 307] 𝚫=7 R=[15, 24]
'Breadcrumbs.IdxToUTF16Range.longASCII' mem_pages [i1, i2]: min=[1050, 1049] 𝚫=1 R=[14, 19]
'Breadcrumbs.IdxToUTF16Range.longMixed' mem_pages [i1, i2]: min=[310, 310] 𝚫=0 R=[28, 29]
'Breadcrumbs.UTF16ToIdx.longASCII' mem_pages [i1, i2]: min=[1089, 1091] 𝚫=2 R=[31, 28]
'Breadcrumbs.UTF16ToIdx.longMixed' mem_pages [i1, i2]: min=[300, 307] 𝚫=7 R=[26, 20]
'Breadcrumbs.UTF16ToIdxRange.longASCII' mem_pages [i1, i2]: min=[1102, 1089] 𝚫=13 R=[25, 36]
'Breadcrumbs.UTF16ToIdxRange.longMixed' mem_pages [i1, i2]: min=[305, 305] 𝚫=0 R=[14, 21]

palimondo · 2018-11-29T22:03:10Z

Well, actually, what you were saying about run_InsertCharacterTowardsEndIndex, is the memory varied between runs because you presumed different N between the runs, right?

But the the way BenchmarkDoctor does this measurement, is to first calibrate number of samples to run for ~1s and then do the 5 i1 and 5 runs i2 runs with the constant number of samples (half in the i2 case). Each of the independent runs is a fresh invocation of Benchmark_O with the specified parameters. There should no longer be any variance from auto-scaling of the benchmarks present.

milseman · 2018-11-30T22:42:55Z

I was assuming "iteration" means the count that's passed into the run function, so run_InsertCharacterTowardsEndIndex's N parameter is its "iteration". run_InsertCharacterTowardsEndIndex inserts 3000 * N characters into a string, so it's memory usage would scale linearly with N.

Or, is that what you mean by "sample"? I thought multiple "samples" would be running the same benchmark on the same N multiple times.

palimondo · 2018-12-03T17:45:39Z

You are correct. The --num-iters parameter becomes the N that's passed to run_ functions. Which means these benchmarks need to be corrected to do constant workload regardless of N. I'll fix that.

That issue wasn't caught by the corresponding heuristic in BenchmarkDoctor, because it's masked by the relatively large intra i-series variance in memory used:

normal_range = 15  # pages
…
if abs(min_i1 - min_i2) > max(range_i1, range_i2, normal_range):

Let's have a look at this one:

String.insert.EmojiChar.EndIndex mem_pages [i1, i2]: min=[11, 12] 𝚫=1 R=[37, 29]

The runs within each i-series, i.e. i1 and i2, are executed with the exact same N count (1, 2). The minimum number of pages used was 11 and 12 respectively, but the variance within each series is triple that! Any idea what might be causing that?

palimondo · 2018-12-03T18:29:13Z

We start with str, which is 25*200 = 5000 bytes in UTF-8, which is 2-3 memory pages. It get's appended to 1000 x the 15 bytes emoji, the final string is 20000 bytes or at least 5 pages. We need to hold both strings in memory at the same time, so that's at least 10 pages. So the minimum of 11 makes perfect sense to me. Now, there might be memory fragmentation that would cause the allocator to touch multiple pages as the modified string grows but is going up to 48 pages normal?

milseman · 2018-12-03T18:31:47Z

Is mem_pages a record of the number of pages used during overall execution, or peak memory usage? When the string grows in size, it can trigger an allocation and copy, so we'd expect to see many pages allocated and deallocated over the course of this benchmark.

palimondo · 2018-12-03T18:40:47Z

The mem_pages is delta of 2 peaks as reported by rusage. We record baseline before we start running and another one when we're done with the benchmark. I think that statistic piggy-backs on the virtual memory system's mechanism for marking dirty pages?

Thinking about this more… the ideal case assumes the allocator will reuse same memory pages for all samples, which might not be the case, as we are still running a lot of samples and the same page might get allocated to different process, right? I'm not familiar with how the system's allocator does that in detail.

palimondo · 2018-12-03T18:51:26Z

Hmm... that probably means that memory usage measurement needs to be always performed with a single sample only and therefore separated from the runtime measurements. What do you think?

Update: I ran this 10 x:
./Benchmark_O --num-iters=1 --num-samples=1 --memory InsertCharacterTowardsEndIndexNonASCII
… and got mem_pages: 16, 13, 12, 16, 16, 11, 11, 13, 11, 11.

So it's improved, but still not without variance… 🤷‍♂️. But R=5 is a lot better. So I guess I'll test that on the whole SBS and report back.

palimondo · 2018-12-03T19:11:37Z

Hmm… ./Benchmark_O --num-iters=1 --num-samples=1 --memory InsertCharacterTowardsEndIndexNonASCII --verbose | tail -n 6 | head -n 1 again...
10, 10, 11, 15, 11, 10, 21, 10, 10, 10, 19, 10, 20, 12
That's R=11.

palimondo added 3 commits November 27, 2018 22:49

[benchmark] Doctor: emit mem_page details info

9a04207

Promoting previously DEBUG message to INFO.

[benchmark] MarkdownReportHandler

92cf40d

`logging.Handler` that creates nicely formatted report from `BecnhmarkDoctor`’s `check` in Markdown table for display on GitHub.

[benchmark] Added Benchmark Check Report

cbd8821

Produce Markdown formatted report, analyzing the quality of newly added benchmarks.

palimondo requested a review from eeckstein November 27, 2018 22:08

palimondo requested a review from lorentey November 28, 2018 19:31

lorentey mentioned this pull request Nov 28, 2018

[benchmark] Add some benchmarks for String breadcrumbs #20769

Merged

eeckstein requested changes Nov 28, 2018

View reviewed changes

benchmark/scripts/run_smoke_bench Outdated Show resolved Hide resolved

[benchmark] run_smoke_bench -skip-check-added

98d6f22

Removed the option to `-check-added`, as it’s now run by default. Replaced with option to skip checking added benchmarks: `-skip-check-added`.

palimondo mentioned this pull request Nov 28, 2018

[benchmark] Rename InsertCharacter to String.insert #20837

Closed

eeckstein reviewed Nov 28, 2018

View reviewed changes

benchmark/scripts/run_smoke_bench Outdated Show resolved Hide resolved

[benchmark] Call test_opt_levels directly

94e86c0

Simplify the code by removing indirection: After removal of -check-added option, only single function remains.

[benchmark] Don’t emit report for no added tests

cd47f32

palimondo force-pushed the and-dreadfully-distinct branch from a6f26e2 to cd47f32 Compare November 28, 2018 22:58

palimondo removed the request for review from lorentey November 28, 2018 23:02

palimondo merged commit d6392cc into swiftlang:master Nov 29, 2018

[benchmark] Added Benchmark Check Report #20807

[benchmark] Added Benchmark Check Report #20807

Uh oh!

Conversation

palimondo commented Nov 27, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

palimondo commented Nov 27, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

eeckstein left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

palimondo commented Nov 28, 2018

Uh oh!

Uh oh!

palimondo commented Nov 28, 2018

Uh oh!

palimondo commented Nov 28, 2018

Uh oh!

eeckstein commented Nov 28, 2018

Uh oh!

eeckstein commented Nov 28, 2018

Uh oh!

eeckstein commented Nov 28, 2018

Uh oh!

eeckstein commented Nov 28, 2018

Uh oh!

palimondo commented Nov 28, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

palimondo commented Nov 28, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

swift-ci commented Nov 28, 2018

Build comment file:

Performance: -O

Performance: -Osize

Performance: -Onone

Uh oh!

eeckstein commented Nov 28, 2018

Uh oh!

eeckstein commented Nov 28, 2018

Uh oh!

eeckstein commented Nov 28, 2018

Uh oh!

swift-ci commented Nov 29, 2018

Build comment file:

No performance and code size changes

Uh oh!

palimondo commented Nov 29, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

milseman commented Nov 29, 2018

Uh oh!

palimondo commented Nov 29, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

milseman commented Nov 29, 2018

Uh oh!

palimondo commented Nov 29, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

palimondo commented Nov 29, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

milseman commented Nov 30, 2018

Uh oh!

palimondo commented Dec 3, 2018

Uh oh!

palimondo commented Dec 3, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

milseman commented Dec 3, 2018

Uh oh!

palimondo commented Dec 3, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

palimondo commented Dec 3, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

palimondo commented Nov 27, 2018 •

edited

Loading

palimondo commented Nov 27, 2018 •

edited

Loading

palimondo commented Nov 28, 2018 •

edited

Loading

palimondo commented Nov 28, 2018 •

edited

Loading

palimondo commented Nov 29, 2018 •

edited

Loading

palimondo commented Nov 29, 2018 •

edited

Loading

palimondo commented Nov 29, 2018 •

edited

Loading

palimondo commented Nov 29, 2018 •

edited

Loading

palimondo commented Dec 3, 2018 •

edited

Loading

palimondo commented Dec 3, 2018 •

edited

Loading

palimondo commented Dec 3, 2018 •

edited

Loading

palimondo commented Dec 3, 2018 •

edited

Loading