Performance Benchmarking of ExistentialSpecializer #24206

rajbarik · 2019-04-22T22:14:46Z

Duplicate of https://github.com/apple/swift/pull/24065/files with appropriate random generator!

rajbarik · 2019-04-22T22:15:58Z

@dabrahams @atrick @slavapestov Here is the new one with the new random number generator.

rajbarik · 2019-04-23T19:58:44Z

@atrick please review this one when you get a chance. Since the previous one was reverted, I am reposting it with updated random number API.

atrick

Thanks @rajbarik. But, do you see the performance delta now when building/running this as part of the benchmark binary?

rajbarik · 2019-04-24T18:16:57Z

@atrick, I do see the performance delta locally. Earlier, it was not getting invoked as part of main.swift which prevented it from being reported.

atrick · 2019-04-24T19:23:57Z

@swift-ci benchmark

swift-ci · 2019-04-24T19:49:00Z

Performance: -O

TEST	OLD	NEW	DELTA	RATIO
Improvement
DataAppendDataLargeToLarge	45000	32400	-28.0%	1.39x (?)
StringBuilderLong	1190	1060	-10.9%	1.12x (?)
Added
BucketSort	34570	35410	35107	—

Performance: -Osize

TEST	OLD	NEW	DELTA	RATIO
Improvement
DataAppendDataLargeToLarge	45400	33200	-26.9%	1.37x (?)
ObjectiveCBridgeStubFromNSDateRef	4230	3540	-16.3%	1.19x (?)
ObjectiveCBridgeStubNSDateRefAccess	359	333	-7.2%	1.08x (?)
Added
BucketSort	34840	35450	35050	—

Performance: -Onone

TEST	OLD	NEW	DELTA	RATIO
Regression
DataAppendDataLargeToLarge	33600	45400	+35.1%	0.74x (?)
ObjectiveCBridgeStubFromNSDate	5610	6530	+16.4%	0.86x (?)
Improvement
ObjectiveCBridgeStubFromNSDateRef	4900	3990	-18.6%	1.23x (?)
ObjectiveCBridgeStubFromNSString	802	739	-7.9%	1.09x (?)
Added
BucketSort	107770	108570	108040	—

✅	Benchmark Check Report
⚠️⏱	`BucketSort` execution took at least 3407 μs. _{Decrease the workload of BucketSort by a factor of 4 (10), to be less than 1000 μs.}

How to read the data

The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview

  Model Name: Mac Pro
  Model Identifier: MacPro6,1
  Processor Name: 8-Core Intel Xeon E5
  Processor Speed: 3 GHz
  Number of Processors: 1
  Total Number of Cores: 8
  L2 Cache (per Core): 256 KB
  L3 Cache: 25 MB
  Memory: 64 GB

atrick · 2019-04-25T00:13:37Z

@rajbarik looks like BucketSort takes at least 4x too much time to execute. To reduce noise, the current "microbenchmarking" approach to running the benchmarks expects them to be between 20 µs (for the timer precision and/or overhead) and 1 ms (to be within the thread-switch quantum).

rajbarik · 2019-04-25T17:35:31Z

@atrick thanks. I will then decrease the number of iterations by a factor 4.

rajbarik · 2019-04-25T17:39:39Z

benchmark/single-source/BucketSort.swift

+  }
+  return true
+}
+let NUMITEMS = 2500


@atrick reduced this from 10000 to 2500. should be fine now?

Sorry, my fault! The more important run time will be after you enable ExistentialSpecialization. I did this locally, and it only takes 160 µs with 10,000 items. So that was actually a better choice.

The only warning I see now is this.

memory: 'BucketSort' has very wide range of memory used between independent, repeated measurements. memory: 'BucketSort' mem_pages [i1, i2]: min=[19, 19] 𝚫=0 R=[0, 24]

If you think of a way to improve it great, but I don't think it needs to hold up the commit.

That memory warning is probably too sensitive. This benchmark looks just fine as is from the memory usage perspective.

atrick · 2019-04-25T17:48:16Z

@swift-ci benchmark

atrick · 2019-04-25T17:48:25Z

@swift-ci smoke test

swift-ci · 2019-04-25T18:49:37Z

Performance: -O

TEST	MIN	MAX	MEAN	MAX_RSS
Added
BucketSort	35510	36500	35853	—

Performance: -Osize

TEST	OLD	NEW	DELTA	RATIO
Improvement
ObjectiveCBridgeStubFromNSDateRef	4740	3940	-16.9%	1.20x (?)
ObjectiveCBridgeStubNSDateRefAccess	400	371	-7.2%	1.08x (?)
Added
BucketSort	36670	37670	37217	—

Performance: -Onone

TEST	OLD	NEW	DELTA	RATIO
Regression
ObjectiveCBridgeStubFromNSDate	6400	7380	+15.3%	0.87x (?)
Improvement
ObjectiveCBridgeStubFromNSDateRef	5450	4450	-18.3%	1.22x (?)
Added
BucketSort	60690	60980	60843	—

✅	Benchmark Check Report
⚠️⏱	`BucketSort` execution took at least 3517 μs. _{Decrease the workload of BucketSort by a factor of 4 (10), to be less than 1000 μs.}

How to read the data

The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview

  Model Name: Mac Pro
  Model Identifier: MacPro6,1
  Processor Name: 12-Core Intel Xeon E5
  Processor Speed: 2.7 GHz
  Number of Processors: 1
  Total Number of Cores: 12
  L2 Cache (per Core): 256 KB
  L3 Cache: 30 MB
  Memory: 64 GB

atrick · 2019-04-25T20:45:30Z

benchmark/single-source/BucketSort.swift

+  runFunction: run_BucketSort,
+  tags: [.validation, .algorithm],
+  setUpFunction: { buildWorkload() },
+  legacyFactor: 10)


@rajbarik apparently legacyFactor is not something you're supposed to define. You can just remove it.

@eeckstein @gottesmm Please comment the purpose of this setting in TestsUtils.swift.
Actually, this looks like something @palimondo added.

Sorry for causing confusion, I’ll fix the docs tomorrow.

atrick

@rajbarik I think if you just remove legacyFactor and go back to 10k items this can be merged now.

atrick · 2019-04-25T21:33:39Z

benchmark/single-source/BucketSort.swift

+  }
+  return true
+}
+let NUMITEMS = 2500


Sorry, my fault! The more important run time will be after you enable ExistentialSpecialization. I did this locally, and it only takes 160 µs with 10,000 items. So that was actually a better choice.

The only warning I see now is this.

memory: 'BucketSort' has very wide range of memory used between independent, repeated measurements. memory: 'BucketSort' mem_pages [i1, i2]: min=[19, 19] 𝚫=0 R=[0, 24]

If you think of a way to improve it great, but I don't think it needs to hold up the commit.

palimondo

I haven’t run this locally and the smoke benchmark doesn’t gather multiple independent samples from newly added benchmarks, but what variation are you seeing when running locally? Benchmark_Driver -i 10 BucketSort

I just want to be sure the true randomness isn’t making this too noisy. If it does, the PRNG could help.

palimondo · 2019-04-25T21:19:25Z

benchmark/single-source/BucketSort.swift

+let items: [Int] = {
+  var array: [Int]? = [Int]()
+  for _ in 0..<NUMITEMS {
+    array!.append(Int.random(in: 0..<MAXBUCKETSIZE))


Could we use a pseudorandom generator here, so that we are always testing the same workload and eliminate an unnecessary source of noise from the performance measurement?

See SplitMix64 over here.

Yes, benchmarks should always use the pseudo random generator.
I added the LSFR utility in TestsUtils.swift long ago for this purpose.

palimondo · 2019-04-25T22:23:29Z

benchmark/single-source/BucketSort.swift

+  }
+  return true
+}
+let NUMITEMS = 2500


That memory warning is probably too sensitive. This benchmark looks just fine as is from the memory usage perspective.

rajbarik · 2019-04-26T15:59:24Z

@atrick Accidentally, I merged this one with NUMITEMS set to 2500. I will create another PR with NUMITEMS set back to 10000.

atrick · 2019-04-26T16:01:44Z

@rajbarik Thanks. If you could use LSFR instead of Random that would also encourage best practice.

atrick · 2019-04-26T16:04:07Z

@rajbarik While you're at it, please also remove the legacyFactor.

rajbarik · 2019-04-26T17:49:29Z

@atrick will take care of both of these. Meanwhile, please run perf benchmarking at #19820.

palimondo · 2019-04-27T19:11:24Z

@atrick The LSFR in TestUtils doesn’t conform for RandomNumberGenrator. See the SplitMix64 implementation linked above for a simple drop in pseudo-random generator that conforms to the protocol.

palimondo · 2019-04-29T08:34:09Z

@rajbarik SplitMix64 implementation has now landed in TestUtils and is available for use in all benchmarks.

rajbarik requested review from atrick and dabrahams April 22, 2019 23:51

atrick approved these changes Apr 24, 2019

View reviewed changes

rajbarik removed the request for review from dabrahams April 24, 2019 18:19

Performance Benchmarking of ExistentialSpecializer

2f32778

rajbarik force-pushed the raj-bench branch from 60f8b65 to 2f32778 Compare April 25, 2019 17:39

rajbarik commented Apr 25, 2019

View reviewed changes

atrick reviewed Apr 25, 2019

View reviewed changes

atrick approved these changes Apr 25, 2019

View reviewed changes

palimondo reviewed Apr 25, 2019

View reviewed changes

rajbarik merged commit 34fbbcc into swiftlang:master Apr 26, 2019

rajbarik mentioned this pull request Apr 26, 2019

Enable ExistentialSpecializer by default #19820

Merged

palimondo mentioned this pull request Apr 28, 2019

[benchmark] Add SplitMix64 PRNG #24357

Merged

palimondo mentioned this pull request Apr 29, 2019

[benchmark] Use Int.random instead of arc4random_uniform. #24091

Closed

palimondo mentioned this pull request Apr 29, 2019

[benchmark] BucketSort gardening and adjustments #24364

Merged

Performance Benchmarking of ExistentialSpecializer #24206

Performance Benchmarking of ExistentialSpecializer #24206

Uh oh!

Conversation

rajbarik commented Apr 22, 2019

Uh oh!

rajbarik commented Apr 22, 2019

Uh oh!

rajbarik commented Apr 23, 2019

Uh oh!

atrick left a comment

Choose a reason for hiding this comment

Uh oh!

rajbarik commented Apr 24, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

atrick commented Apr 24, 2019

Uh oh!

swift-ci commented Apr 24, 2019

Performance: -O

Performance: -Osize

Performance: -Onone

Uh oh!

atrick commented Apr 25, 2019

Uh oh!

rajbarik commented Apr 25, 2019

Uh oh!

rajbarik Apr 25, 2019

Choose a reason for hiding this comment

Uh oh!

atrick Apr 25, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

palimondo Apr 25, 2019

Choose a reason for hiding this comment

Uh oh!

atrick commented Apr 25, 2019

Uh oh!

atrick commented Apr 25, 2019

Uh oh!

swift-ci commented Apr 25, 2019

Performance: -O

Performance: -Osize

Performance: -Onone

Uh oh!

atrick Apr 25, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

palimondo Apr 25, 2019

Choose a reason for hiding this comment

Uh oh!

atrick left a comment

Choose a reason for hiding this comment

Uh oh!

atrick Apr 25, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

palimondo left a comment

Choose a reason for hiding this comment

Uh oh!

palimondo Apr 25, 2019

Choose a reason for hiding this comment

Uh oh!

atrick Apr 25, 2019

Choose a reason for hiding this comment

Uh oh!

palimondo Apr 25, 2019

Choose a reason for hiding this comment

Uh oh!

rajbarik commented Apr 26, 2019

Uh oh!

atrick commented Apr 26, 2019

Uh oh!

atrick commented Apr 26, 2019

Uh oh!

rajbarik commented Apr 26, 2019

Uh oh!

rajbarik commented Apr 24, 2019 •

edited

Loading

atrick Apr 25, 2019 •

edited

Loading

atrick Apr 25, 2019 •

edited

Loading

atrick Apr 25, 2019 •

edited

Loading