Skip to content

[benchmark] Added Benchmark Check Report #20807

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Nov 29, 2018

Conversation

palimondo
Copy link
Contributor

@palimondo palimondo commented Nov 27, 2018

Integrates the Benchmark_Driver check command into run_smoke_bench, producing Markdown formatted report table, analyzing the quality of newly added benchmarks.

Example output:

Benchmark Check Report
⛔️🔤 String.insert.EmojiChar.NearEnd name doesn`t conform to UpperCamelCase convention.
See http://bit.ly/UpperCamelCase
⚠️ String.insert.EmojiChar.NearEnd execution took at least 1871 μs.
Decrease the workload of String.insert.EmojiChar.NearEnd by a factor of 2 (10), to be less than 1000 μs.
⛔️🔤 String.insert.EmojiChar.EndIndex name doesn`t conform to UpperCamelCase convention.
See http://bit.ly/UpperCamelCase
⚠️Ⓜ️ String.insert.EmojiChar.EndIndex has very wide range of memory used between independent, repeated measurements.
String.insert.EmojiChar.EndIndex mem_pages [i1, i2]: min=[12, 22] 𝚫=10 R=[15, 18]

If there are no issue with added benchmarks, the result looks like this:

Benchmark Check Report

Promoting previously DEBUG message to INFO.
`logging.Handler` that creates nicely formatted report from `BecnhmarkDoctor`’s `check` in Markdown table for display on GitHub.
Produce Markdown formatted report, analyzing the quality of newly added benchmarks.
@palimondo palimondo requested a review from eeckstein November 27, 2018 22:08
@palimondo
Copy link
Contributor Author

palimondo commented Nov 27, 2018

This is the reason why I'd like to finish and merge the #20334 [benchmark] Naming Convention
cc @atrick @lorentey @airspeedswift @jrose-apple

Copy link
Contributor

@eeckstein eeckstein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically LGTM! (Please see my comment inline)

Also, can you add a temporary test-commit to this PR which introduces some new "bad" benchmarks, which violate several rules? So that we can test this change with "swift-ci benchmark".

Removed the option to `-check-added`, as it’s now run by default. Replaced with option to skip checking added benchmarks: `-skip-check-added`.
@palimondo
Copy link
Contributor Author

@eeckstein I've added a -skip-check-added option. We can test with #20837, that renames InsertCharacter benchmarks to String.insert applying the proposed naming convention, because current naming rule will complain the new names don't conform to UpperCamelCase.

Simplify the code by removing indirection:
After removal of -check-added option, only single function remains.
@palimondo
Copy link
Contributor Author

@eeckstein Removed the indirection, test_opt_levels is now called directly.

@palimondo
Copy link
Contributor Author

BTW, it looks like swift-ci doesn’t support combining PRs for benchmark the way it does(?) for test. We tried running Karoy’s new Breadcrumbs with this one, but I don’t think it worked.

@eeckstein
Copy link
Contributor

Can you just temporarily add a test-commit to this PR, and then after the benchmark run, remove it again?

@eeckstein
Copy link
Contributor

@swift-ci benchmark

2 similar comments
@eeckstein
Copy link
Contributor

@swift-ci benchmark

@eeckstein
Copy link
Contributor

@swift-ci benchmark

@palimondo
Copy link
Contributor Author

palimondo commented Nov 28, 2018

I've noticed the MarkdownReportHandler always prints the header, which is useful when there are no validation issues, but it did so even when there were no added benchmarks. a6f26e2 works around that by not adding the handler when added set is empty. @eeckstein Do you want to re-run benchmarks again with that change, too?

@palimondo palimondo force-pushed the and-dreadfully-distinct branch from a6f26e2 to cd47f32 Compare November 28, 2018 22:58
@palimondo
Copy link
Contributor Author

palimondo commented Nov 28, 2018

I hope the CI bot's managed to clone before I've force-pushed 🙃
Assuming all goes well, I've removed the DNM commit and if you agree the cd47f32 is trivial, once all passes test this can land while I take my 💤s… (it's midnight here) 😴

@palimondo palimondo removed the request for review from lorentey November 28, 2018 23:02
@swift-ci
Copy link
Contributor

Build comment file:

Performance: -O

TEST MIN MAX MEAN MAX_RSS
Added
String.insert.ASCIIChar.EndIndex 156 158 157
String.insert.ASCIIChar.NearEnd 199 204 201
String.insert.ASCIIChar.StartIndex 651 651 651
String.insert.EmojiChar.EndIndex 60 74 65
String.insert.EmojiChar.NearEnd 224 226 225
String.insert.EmojiChar.StartIndex 365 365 365
Removed
InsertCharacterEndIndex 155 159 156
InsertCharacterEndIndexNonASCII 58 69 62
InsertCharacterStartIndex 649 690 663
InsertCharacterStartIndexNonASCII 360 367 362
InsertCharacterTowardsEndIndex 200 204 201
InsertCharacterTowardsEndIndexNonASCII 219 223 221

Performance: -Osize

TEST MIN MAX MEAN MAX_RSS
Added
String.insert.ASCIIChar.EndIndex 216 221 218
String.insert.ASCIIChar.NearEnd 214 218 216
String.insert.ASCIIChar.StartIndex 655 655 655
String.insert.EmojiChar.EndIndex 57 72 62
String.insert.EmojiChar.NearEnd 242 247 244
String.insert.EmojiChar.StartIndex 364 364 364
Removed
InsertCharacterEndIndex 163 169 165
InsertCharacterEndIndexNonASCII 56 60 58
InsertCharacterStartIndex 652 652 652
InsertCharacterStartIndexNonASCII 363 363 363
InsertCharacterTowardsEndIndex 217 224 219
InsertCharacterTowardsEndIndexNonASCII 225 227 226

Performance: -Onone

TEST MIN MAX MEAN MAX_RSS
Added
String.insert.ASCIIChar.EndIndex 222 224 223
String.insert.ASCIIChar.NearEnd 242 245 243
String.insert.ASCIIChar.StartIndex 753 754 753
String.insert.EmojiChar.EndIndex 80 100 87
String.insert.EmojiChar.NearEnd 235 239 236
String.insert.EmojiChar.StartIndex 406 407 407
Removed
InsertCharacterEndIndex 222 224 223
InsertCharacterEndIndexNonASCII 80 82 81
InsertCharacterStartIndex 739 739 739
InsertCharacterStartIndexNonASCII 404 405 405
InsertCharacterTowardsEndIndex 238 240 239
InsertCharacterTowardsEndIndexNonASCII 234 238 236
Benchmark Check Report
⛔️🔤 String.insert.EmojiChar.NearEnd name doesn`t conform to UpperCamelCase convention.
See http://bit.ly/UpperCamelCase
⚠️Ⓜ️ String.insert.EmojiChar.NearEnd has very wide range of memory used between independent, repeated measurements.
String.insert.EmojiChar.NearEnd mem_pages [i1, i2]: min=[11, 12] 𝚫=1 R=[0, 59]
⛔️🔤 String.insert.EmojiChar.EndIndex name doesn`t conform to UpperCamelCase convention.
See http://bit.ly/UpperCamelCase
⚠️Ⓜ️ String.insert.EmojiChar.EndIndex has very wide range of memory used between independent, repeated measurements.
String.insert.EmojiChar.EndIndex mem_pages [i1, i2]: min=[11, 12] 𝚫=1 R=[37, 29]
⛔️🔤 String.insert.ASCIIChar.StartIndex name doesn`t conform to UpperCamelCase convention.
See http://bit.ly/UpperCamelCase
⚠️Ⓜ️ String.insert.ASCIIChar.StartIndex has very wide range of memory used between independent, repeated measurements.
String.insert.ASCIIChar.StartIndex mem_pages [i1, i2]: min=[11, 11] 𝚫=0 R=[26, 9]
⛔️🔤 String.insert.ASCIIChar.EndIndex name doesn`t conform to UpperCamelCase convention.
See http://bit.ly/UpperCamelCase
⛔️🔤 String.insert.ASCIIChar.NearEnd name doesn`t conform to UpperCamelCase convention.
See http://bit.ly/UpperCamelCase
⚠️Ⓜ️ String.insert.ASCIIChar.NearEnd has very wide range of memory used between independent, repeated measurements.
String.insert.ASCIIChar.NearEnd mem_pages [i1, i2]: min=[11, 11] 𝚫=0 R=[9, 26]
⛔️🔤 String.insert.EmojiChar.StartIndex name doesn`t conform to UpperCamelCase convention.
See http://bit.ly/UpperCamelCase
How to read the data The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview
  Model Name: Mac Pro
  Model Identifier: MacPro6,1
  Processor Name: 12-Core Intel Xeon E5
  Processor Speed: 2.7 GHz
  Number of Processors: 1
  Total Number of Cores: 12
  L2 Cache (per Core): 256 KB
  L3 Cache: 30 MB
  Memory: 64 GB
--------------

@eeckstein
Copy link
Contributor

@swift-ci smoke test

1 similar comment
@eeckstein
Copy link
Contributor

@swift-ci smoke test

@eeckstein
Copy link
Contributor

@swift-ci benchmark

@swift-ci
Copy link
Contributor

Build comment file:

No performance and code size changes


@palimondo palimondo merged commit d6392cc into swiftlang:master Nov 29, 2018
@palimondo
Copy link
Contributor Author

palimondo commented Nov 29, 2018

@milseman Can you help me understand the unusual memory variance seen in the demonstration? Similar variance issues manifested themselves over in Breadcrumbs. Without knowing any details about their implementation, I'd guess there's some kind of memory leak in the new Strings? Granted, "constant memory use" rule is quite unproven recommendation at this time: I asked for advice in the forums before…

@milseman
Copy link
Member

That's weird, let's make sure it's not a formal memory leak. Which one seems to be the biggest offender?

@palimondo
Copy link
Contributor Author

palimondo commented Nov 29, 2018

Nothing stands out especially, i.e. all look very strange. Let me explain the metric some more: String.insert.EmojiChar.NearEnd mem_pages [i1, i2]: min=[11, 12] 𝚫=1 R=[0, 59]

This means: i1 (measured with --num-iters=1), i2 means --num-iters=2. There were 10 independent runs, 5 i1, 5 i2. The lowest memory used was 11 pages (11 * 4096B = 44kB) and it was like that for all the runs, because the first R (range) is 0. But for i2, the second R means the highest memory consumption seen was 12 + 59 = 71 pages (284 kB).

Looking at all of the above ⚠️Ⓜ️ gives me very 🐟y vibes...

@milseman
Copy link
Member

This might be because those benchmarks don't reset their state after every iteration:

@inline(__always)
func insertTowardsEndIndex(_ c: Character, in string: String, count: Int) {
  var workload = string
  var index = workload.endIndex
  for i in 0..<count {
    workload.insert(identity(c), at: index)
    if i % 1000 == 0 {
      index = workload.endIndex
    }
  }
  blackHole(workload)
}
@inline(never)
func run_InsertCharacterTowardsEndIndex(_ N: Int) {
  insertTowardsEndIndex("s", in: str, count: N * 3000)
}

It seems like run_InsertCharacterTowardsEndIndex should have an outer loop from 0..<N and then run insertTowardsEndIndex with a count of exactly 3000. That would give constant size and linear perf, instead of what I assume is currently linear size and quadratic perf.

@palimondo
Copy link
Contributor Author

palimondo commented Nov 29, 2018

@milseman That's actually good to hear, we can fix that!

And looking at new Breadcrumbs tests, would you say the variance is normal, or is these some kind of issue, too? These are my local results, where the heuristic detected "very wide range of memory used between independent, repeated measurements":

'Breadcrumbs.CopyUTF16CodeUnits.ASCII' mem_pages [i1, i2]: min=[22, 23] 𝚫=1 R=[19, 18]
'Breadcrumbs.IdxToUTF16.longMixed' mem_pages [i1, i2]: min=[300, 307] 𝚫=7 R=[15, 24]
'Breadcrumbs.IdxToUTF16Range.longASCII' mem_pages [i1, i2]: min=[1050, 1049] 𝚫=1 R=[14, 19]
'Breadcrumbs.IdxToUTF16Range.longMixed' mem_pages [i1, i2]: min=[310, 310] 𝚫=0 R=[28, 29]
'Breadcrumbs.UTF16ToIdx.longASCII' mem_pages [i1, i2]: min=[1089, 1091] 𝚫=2 R=[31, 28]
'Breadcrumbs.UTF16ToIdx.longMixed' mem_pages [i1, i2]: min=[300, 307] 𝚫=7 R=[26, 20]
'Breadcrumbs.UTF16ToIdxRange.longASCII' mem_pages [i1, i2]: min=[1102, 1089] 𝚫=13 R=[25, 36]
'Breadcrumbs.UTF16ToIdxRange.longMixed' mem_pages [i1, i2]: min=[305, 305] 𝚫=0 R=[14, 21]

@palimondo
Copy link
Contributor Author

palimondo commented Nov 29, 2018

Well, actually, what you were saying about run_InsertCharacterTowardsEndIndex, is the memory varied between runs because you presumed different N between the runs, right?

But the the way BenchmarkDoctor does this measurement, is to first calibrate number of samples to run for ~1s and then do the 5 i1 and 5 runs i2 runs with the constant number of samples (half in the i2 case). Each of the independent runs is a fresh invocation of Benchmark_O with the specified parameters. There should no longer be any variance from auto-scaling of the benchmarks present.

@milseman
Copy link
Member

I was assuming "iteration" means the count that's passed into the run function, so run_InsertCharacterTowardsEndIndex's N parameter is its "iteration". run_InsertCharacterTowardsEndIndex inserts 3000 * N characters into a string, so it's memory usage would scale linearly with N.

Or, is that what you mean by "sample"? I thought multiple "samples" would be running the same benchmark on the same N multiple times.

@palimondo
Copy link
Contributor Author

You are correct. The --num-iters parameter becomes the N that's passed to run_ functions. Which means these benchmarks need to be corrected to do constant workload regardless of N. I'll fix that.

That issue wasn't caught by the corresponding heuristic in BenchmarkDoctor, because it's masked by the relatively large intra i-series variance in memory used:

normal_range = 15  # pagesif abs(min_i1 - min_i2) > max(range_i1, range_i2, normal_range):

Let's have a look at this one:

String.insert.EmojiChar.EndIndex mem_pages [i1, i2]: min=[11, 12] 𝚫=1 R=[37, 29]

The runs within each i-series, i.e. i1 and i2, are executed with the exact same N count (1, 2). The minimum number of pages used was 11 and 12 respectively, but the variance within each series is triple that! Any idea what might be causing that?

@palimondo
Copy link
Contributor Author

palimondo commented Dec 3, 2018

We start with str, which is 25*200 = 5000 bytes in UTF-8, which is 2-3 memory pages. It get's appended to 1000 x the 15 bytes emoji, the final string is 20000 bytes or at least 5 pages. We need to hold both strings in memory at the same time, so that's at least 10 pages. So the minimum of 11 makes perfect sense to me. Now, there might be memory fragmentation that would cause the allocator to touch multiple pages as the modified string grows but is going up to 48 pages normal?

@milseman
Copy link
Member

milseman commented Dec 3, 2018

Is mem_pages a record of the number of pages used during overall execution, or peak memory usage? When the string grows in size, it can trigger an allocation and copy, so we'd expect to see many pages allocated and deallocated over the course of this benchmark.

@palimondo
Copy link
Contributor Author

palimondo commented Dec 3, 2018

The mem_pages is delta of 2 peaks as reported by rusage. We record baseline before we start running and another one when we're done with the benchmark. I think that statistic piggy-backs on the virtual memory system's mechanism for marking dirty pages?

Thinking about this more… the ideal case assumes the allocator will reuse same memory pages for all samples, which might not be the case, as we are still running a lot of samples and the same page might get allocated to different process, right? I'm not familiar with how the system's allocator does that in detail.

@palimondo
Copy link
Contributor Author

palimondo commented Dec 3, 2018

Hmm... that probably means that memory usage measurement needs to be always performed with a single sample only and therefore separated from the runtime measurements. What do you think?

Update: I ran this 10 x:
./Benchmark_O --num-iters=1 --num-samples=1 --memory InsertCharacterTowardsEndIndexNonASCII
… and got mem_pages: 16, 13, 12, 16, 16, 11, 11, 13, 11, 11.

So it's improved, but still not without variance… 🤷‍♂️. But R=5 is a lot better. So I guess I'll test that on the whole SBS and report back.

@palimondo
Copy link
Contributor Author

palimondo commented Dec 3, 2018

Hmm… ./Benchmark_O --num-iters=1 --num-samples=1 --memory InsertCharacterTowardsEndIndexNonASCII --verbose | tail -n 6 | head -n 1 again...
10, 10, 11, 15, 11, 10, 21, 10, 10, 10, 19, 10, 20, 12
That's R=11.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants