Skip to content

[benchmark] Add some benchmarks for String breadcrumbs #20769

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Nov 29, 2018

Conversation

lorentey
Copy link
Member

@lorentey lorentey commented Nov 26, 2018

This adds some benchmarks for String breadcrumbing, via low-level stdlib entry points exposed by String.

  • StringBreadcrumbs.UTF16ToIndex: Convert some random UTF-16 offsets into String indices.
  • StringBreadcrumbs.IndexToUTF16: Convert some random String indices into UTF-16 offsets.
  • StringBreadcrumbs.UTF16ToIndexRange: Split a string into a bunch of random slices and convert their UTF-16 offsets into String index ranges.
  • StringBreadcrumbs.IndexToUTF16Range: Do the same in reverse direction.
  • StringBreadcrumbs.CopyUTF16CodeUnits: Copy UTF-16 code units from random ranges.
  • StringBreadcrumbs.MutatedUTF16ToIndex: This is like UTF16ToIndex but appends to the string after every index conversion. In effect, this tests breadcrumb creation performance.
  • StringBreadcrumbs.MutatedIndexToUTF16: This is like IndexToUTF16 but appends to the string after every index conversion. In effect, this tests breadcrumb creation performance.

https://bugs.swift.org/browse/SR-9226

@lorentey lorentey requested a review from milseman November 26, 2018 20:35
@lorentey
Copy link
Member Author

The CopyUTF16CodeUnits benchmark here will crash until #20768 lands.

@palimondo
Copy link
Contributor

@lorentey #20667 has landed. You should be able to run these now with swift-ci, too.

@lorentey
Copy link
Member Author

#20768
@swift-ci smoke test

@lorentey
Copy link
Member Author

#20768
@swift-ci benchmark

@palimondo
Copy link
Contributor

palimondo commented Nov 26, 2018

I haven't yet finished integration of BenchmarkDoctor into run_smoke_bench… could you run this locally, please?

$ ./Benchmark_Driver check -f StringBreadcrumbs

(ignoring the naming complaints it will spew, as we have a hung jury back in #20334 😜)

@milseman
Copy link
Member

Did you profile this and make sure all the time is spent in the corresponding methods on String? I just want to make sure there isn't any ARC or missed devirtualization opportunities that would change what we're measuring.

@lorentey
Copy link
Member Author

@palimondo The doctor complained about the names and the "very wide range of memory used", but I scaled things so that it likes the timings.

@lorentey
Copy link
Member Author

lorentey commented Nov 26, 2018

@milseman I did some spot checks -- the heavy stack frames were inside the target entry points and looked breadcrumby enough. I wasn't looking for missed optimization opportunities though.

I'll have a more detailed look tomorrow, just to make sure these are all measuring what I think they're measuring.

@palimondo
Copy link
Contributor

palimondo commented Nov 26, 2018

The doctor complained about the names and the "very wide range of memory used"

🧐Interesting. Could you re-check those with --verbose flag, to see how bad it is?
Tip: You can work with just some select benchmarks by their test numbers, which can come in handy if you want to focus on sub-ranges, like this:

$ ./Benchmark_Driver check {204..217} {670..683}

@lorentey
Copy link
Member Author

@palimondo Here's a quick sample of the memory-related output:

memory: 'StringBreadcrumbs.CopyUTF16CodeUnits.shortASCII' has very wide range of memory used between independent, repeated measurements.
memory: StringBreadcrumbs.CopyUTF16CodeUnits.shortASCII mem_pages [i1, i2]:  min=[42, 46] 𝚫=4 R=[4, 17]
memory: 'StringBreadcrumbs.IndexToUTF16.longASCII' has very wide range of memory used between independent, repeated measurements.
memory: StringBreadcrumbs.IndexToUTF16.longASCII mem_pages [i1, i2]:  min=[1108, 1119] 𝚫=11 R=[44, 31]
memory: 'StringBreadcrumbs.IndexToUTF16.longMixed' has very wide range of memory used between independent, repeated measurements.
memory: StringBreadcrumbs.IndexToUTF16.longMixed mem_pages [i1, i2]:  min=[323, 326] 𝚫=3 R=[36, 36]
memory: 'StringBreadcrumbs.IndexToUTF16Range.longASCII' has very wide range of memory used between independent, repeated measurements.
memory: StringBreadcrumbs.IndexToUTF16Range.longASCII mem_pages [i1, i2]:  min=[1116, 1125] 𝚫=9 R=[33, 29]

@swift-ci
Copy link
Contributor

Build comment file:


@milseman
Copy link
Member

@swift-ci please smoke test os x platform

@milseman
Copy link
Member

@swift-ci please benchmark

Copy link
Contributor

@palimondo palimondo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very fresh approach to writing benchmarks! 👍

So far I have these comments:

@milseman
Copy link
Member

@palimondo where is the 40-character limit coming from and is there any way to get more? Is it Github's in-PR renderer that's constraining this?

@swift-ci
Copy link
Contributor

Build comment file:

Performance: -O

TEST MIN MAX MEAN MAX_RSS
Added
StringBreadcrumbs.CopyUTF16CodeUnits.shortASCII 663 700 676
StringBreadcrumbs.CopyUTF16CodeUnits.shortMixed 672 679 675
StringBreadcrumbs.IndexToUTF16.longASCII 660 2211 1177
StringBreadcrumbs.IndexToUTF16.longMixed 791 1312 965
StringBreadcrumbs.IndexToUTF16Range.longASCII 306 1786 800
StringBreadcrumbs.IndexToUTF16Range.longMixed 387 390 388
StringBreadcrumbs.MutatedIndexToUTF16.shortASCII 794 839 817
StringBreadcrumbs.MutatedIndexToUTF16.shortMixed 300 423 361
StringBreadcrumbs.MutatedUTF16ToIndex.shortASCII 791 837 814
StringBreadcrumbs.MutatedUTF16ToIndex.shortMixed 306 425 365
StringBreadcrumbs.UTF16ToIndex.longASCII 380 380 380
StringBreadcrumbs.UTF16ToIndex.longMixed 427 429 428
StringBreadcrumbs.UTF16ToIndexRange.longASCII 99 99 99
StringBreadcrumbs.UTF16ToIndexRange.longMixed 122 122 122

Performance: -Osize

TEST MIN MAX MEAN MAX_RSS
Added
StringBreadcrumbs.CopyUTF16CodeUnits.shortASCII 680 708 690
StringBreadcrumbs.CopyUTF16CodeUnits.shortMixed 688 704 698
StringBreadcrumbs.IndexToUTF16.longASCII 654 2141 1150
StringBreadcrumbs.IndexToUTF16.longMixed 784 1298 956
StringBreadcrumbs.IndexToUTF16Range.longASCII 308 1774 797
StringBreadcrumbs.IndexToUTF16Range.longMixed 391 394 392
StringBreadcrumbs.MutatedIndexToUTF16.shortASCII 791 838 815
StringBreadcrumbs.MutatedIndexToUTF16.shortMixed 300 417 359
StringBreadcrumbs.MutatedUTF16ToIndex.shortASCII 787 834 811
StringBreadcrumbs.MutatedUTF16ToIndex.shortMixed 304 424 364
StringBreadcrumbs.UTF16ToIndex.longASCII 372 373 372
StringBreadcrumbs.UTF16ToIndex.longMixed 423 440 429
StringBreadcrumbs.UTF16ToIndexRange.longASCII 95 97 96
StringBreadcrumbs.UTF16ToIndexRange.longMixed 119 120 119

Performance: -Onone

TEST OLD NEW DELTA RATIO
Regression
ObjectiveCBridgeFromNSArrayAnyObjectForced 8998 10040 +11.6% 0.90x
Added
StringBreadcrumbs.CopyUTF16CodeUnits.shortASCII 1380 1455 1417
StringBreadcrumbs.CopyUTF16CodeUnits.shortMixed 1340 1371 1360
StringBreadcrumbs.IndexToUTF16.longASCII 1294 2818 1803
StringBreadcrumbs.IndexToUTF16.longMixed 1420 1948 1597
StringBreadcrumbs.IndexToUTF16Range.longASCII 606 2103 1105
StringBreadcrumbs.IndexToUTF16Range.longMixed 690 697 693
StringBreadcrumbs.MutatedIndexToUTF16.shortASCII 802 847 824
StringBreadcrumbs.MutatedIndexToUTF16.shortMixed 308 426 367
StringBreadcrumbs.MutatedUTF16ToIndex.shortASCII 809 858 833
StringBreadcrumbs.MutatedUTF16ToIndex.shortMixed 322 444 383
StringBreadcrumbs.UTF16ToIndex.longASCII 1503 1513 1507
StringBreadcrumbs.UTF16ToIndex.longMixed 1552 1569 1561
StringBreadcrumbs.UTF16ToIndexRange.longASCII 341 342 341
StringBreadcrumbs.UTF16ToIndexRange.longMixed 367 368 368
How to read the data The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview
  Model Name: Mac Pro
  Model Identifier: MacPro6,1
  Processor Name: 12-Core Intel Xeon E5
  Processor Speed: 2.7 GHz
  Number of Processors: 1
  Total Number of Cores: 12
  L2 Cache (per Core): 256 KB
  L3 Cache: 30 MB
  Memory: 64 GB
--------------

@palimondo
Copy link
Contributor

palimondo commented Nov 27, 2018

@milseman

where is the 40-character limit coming from and is there any way to get more? Is it Github's in-PR renderer that's constraining this?

Yes, primarily GitHub's fixed table width. As the above report shows, these name at 48 chars are filling the entire available space. But this is a best case scenario, as the -O runtimes are under 1000 μs . When you combine this with longer runtimes and all the benchmarks, it triggers horizontal scrolling.

Also formatting the result table in console (Benchmark_Driver's output as run through build_script). And usability of all the benchmark reports on mobile devices, see #20334 (comment).

@palimondo
Copy link
Contributor

palimondo commented Nov 27, 2018

The variance in IndexToUTF16 and IndexToUTF16Range groups worries me:

TEST MIN MAX MEAN MAX_RSS
StringBreadcrumbs.IndexToUTF16.longASCII 660 2211 1177
StringBreadcrumbs.IndexToUTF16.longMixed 791 1312 965
StringBreadcrumbs.IndexToUTF16Range.longASCII 306 1786 800
StringBreadcrumbs.IndexToUTF16Range.longMixed 387 390 388

Only the last one looks fine. This might be background process interference, with run_smoke_bench. It still uses averaging… instead of going with --num-iters=1. But it's present in both -O and -Osize builds, so it looks quite 🐟y.

@lorentey Could you please check locally the variance isn't this crazy, when measured properly?

$ ./Benchmark_Driver run -i 3 -f StringBreadcrumbs.IndexToUTF16

🤨 These are the same tests that reported very wide range of memory used. There's some issue with the benchmarks or the tested functionality.

@lorentey lorentey force-pushed the breadcrumb-benchmarks branch from 68e886b to ec150df Compare November 28, 2018 17:53
@lorentey
Copy link
Member Author

IdxToUTF16 looks fine to me locally:

$ .../Benchmark_Driver run -i 3 -f Breadcrumbs.IdxToUTF16
#,TEST,SAMPLES,MIN(μs),Q1(μs),MEDIAN(μs),Q3(μs),MAX(μs),MAX_RSS(B)
45,Breadcrumbs.IdxToUTF16.longASCII,55,421,433,434,435,452,4575232
46,Breadcrumbs.IdxToUTF16.longMixed,58,555,572,574,577,600,1310720
47,Breadcrumbs.IdxToUTF16Range.longASCII,57,202,209,209,210,214,4575232
48,Breadcrumbs.IdxToUTF16Range.longMixed,58,290,299,299,301,311,1327104

Total performance tests executed: 4

@lorentey
Copy link
Member Author

Rebased, applied review notes, cleaned up some things.

@swift-ci please smoke test

@lorentey
Copy link
Member Author

@swift-ci benchmark

@lorentey
Copy link
Member Author

Ah, I forgot to push a comment typo fix. Let's wait out the benchmark run first.

@lorentey
Copy link
Member Author

@milseman I profiled all tests looking for anomalies; they seem to work as intended. I did not notice anything left unspecialized, but it seems noteworthy that the IdxToUTF16 and IdxToUTF16Range tests spent about half of their time in BidirectionalCollection's index navigation, and only a third in getBreadcrumb(forIndex:). Some internal heuristic may not work right for these tests.

Copy link
Contributor

@palimondo palimondo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Few cosmetic nits in the comments. Thank's for hearing me out, Karoy!

@lorentey
Copy link
Member Author

Please test with the following PR:

#20807
@swift-ci please benchmark

@lorentey
Copy link
Member Author

@swift-ci please smoke check

@palimondo
Copy link
Contributor

@lorentey BTW, brace for BenchmarkDoctor complaining about names not being in UpperCamelCase, since the new naming convention isn't in yet… could you maybe help me finish and merge that in the meantime?

@swift-ci
Copy link
Contributor

Build comment file:

Performance: -O

TEST OLD NEW DELTA RATIO
Improvement
CharacterLiteralsLarge 108 97 -10.2% 1.11x
CharacterLiteralsSmall 348 325 -6.6% 1.07x
Added
Breadcrumbs.CopyUTF16CodeUnits.ASCII 47 48 48
Breadcrumbs.CopyUTF16CodeUnits.Mixed 62 63 62
Breadcrumbs.IdxToUTF16.longASCII 586 2077 1083
Breadcrumbs.IdxToUTF16.longMixed 714 724 720
Breadcrumbs.IdxToUTF16Range.longASCII 289 1756 778
Breadcrumbs.IdxToUTF16Range.longMixed 371 373 372
Breadcrumbs.MutatedIdxToUTF16.ASCII 758 759 759
Breadcrumbs.MutatedIdxToUTF16.Mixed 257 258 258
Breadcrumbs.MutatedUTF16ToIdx.ASCII 755 756 755
Breadcrumbs.MutatedUTF16ToIdx.Mixed 260 262 261
Breadcrumbs.UTF16ToIdx.longASCII 310 311 311
Breadcrumbs.UTF16ToIdx.longMixed 362 363 363
Breadcrumbs.UTF16ToIdxRange.longASCII 84 86 85
Breadcrumbs.UTF16ToIdxRange.longMixed 106 107 107

Performance: -Osize

TEST OLD NEW DELTA RATIO
Regression
DataCountSmall 34 37 +8.8% 0.92x
DataCountMedium 37 40 +8.1% 0.93x (?)
Added
Breadcrumbs.CopyUTF16CodeUnits.ASCII 47 48 48
Breadcrumbs.CopyUTF16CodeUnits.Mixed 52 55 53
Breadcrumbs.IdxToUTF16.longASCII 584 2118 1095
Breadcrumbs.IdxToUTF16.longMixed 720 726 722
Breadcrumbs.IdxToUTF16Range.longASCII 291 1801 795
Breadcrumbs.IdxToUTF16Range.longMixed 374 376 375
Breadcrumbs.MutatedIdxToUTF16.ASCII 760 761 761
Breadcrumbs.MutatedIdxToUTF16.Mixed 257 272 262
Breadcrumbs.MutatedUTF16ToIdx.ASCII 756 757 756
Breadcrumbs.MutatedUTF16ToIdx.Mixed 259 262 260
Breadcrumbs.UTF16ToIdx.longASCII 310 310 310
Breadcrumbs.UTF16ToIdx.longMixed 360 360 360
Breadcrumbs.UTF16ToIdxRange.longASCII 84 85 84
Breadcrumbs.UTF16ToIdxRange.longMixed 106 108 107

Performance: -Onone

TEST OLD NEW DELTA RATIO
Regression
ExclusivityIndependent 74 88 +18.9% 0.84x
Added
Breadcrumbs.CopyUTF16CodeUnits.ASCII 115 121 117
Breadcrumbs.CopyUTF16CodeUnits.Mixed 118 119 118
Breadcrumbs.IdxToUTF16.longASCII 1187 2742 1706
Breadcrumbs.IdxToUTF16.longMixed 1324 1818 1490
Breadcrumbs.IdxToUTF16Range.longASCII 426 1907 920
Breadcrumbs.IdxToUTF16Range.longMixed 509 512 510
Breadcrumbs.MutatedIdxToUTF16.ASCII 768 769 769
Breadcrumbs.MutatedIdxToUTF16.Mixed 266 267 267
Breadcrumbs.MutatedUTF16ToIdx.ASCII 766 767 766
Breadcrumbs.MutatedUTF16ToIdx.Mixed 270 275 272
Breadcrumbs.UTF16ToIdx.longASCII 905 908 906
Breadcrumbs.UTF16ToIdx.longMixed 944 1025 971
Breadcrumbs.UTF16ToIdxRange.longASCII 213 213 213
Breadcrumbs.UTF16ToIdxRange.longMixed 236 237 237
How to read the data The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview
  Model Name: Mac Pro
  Model Identifier: MacPro6,1
  Processor Name: 12-Core Intel Xeon E5
  Processor Speed: 2.7 GHz
  Number of Processors: 1
  Total Number of Cores: 12
  L2 Cache (per Core): 256 KB
  L3 Cache: 30 MB
  Memory: 64 GB
--------------

@palimondo
Copy link
Contributor

palimondo commented Nov 28, 2018

@lorentey It looks like it didn't run the benchmarks with the Benchmark Check Report from other PR. I guess this only works for test? // cc @eeckstein

@palimondo
Copy link
Contributor

Benchmark MIN MAX MEAN
Breadcrumbs.IdxToUTF16.longASCII 586 2077 1083
Breadcrumbs.IdxToUTF16.longMixed 714 724 720
Breadcrumbs.IdxToUTF16Range.longASCII 289 1756 778
Breadcrumbs.IdxToUTF16Range.longMixed 371 373 372

🧐longMixed look great, but longASCII is weirding me out… You said it looked fine locally, so I don't know how to explain the variance...

@milseman
Copy link
Member

@swift-ci please smoke test

@milseman milseman merged commit 74eaea2 into swiftlang:master Nov 29, 2018
@palimondo
Copy link
Contributor

palimondo commented Dec 3, 2018

@lorentey, you didn't answer my question from older conversation:

longASCII and shortASCII are quite different workloads though.

How so? I only see the same string repeated multiple times. That seems to me just like a different way to scale the workload size, instead of setting the inner-loop multiplier. Am I mistaken? Is the difference about some kind of small string optimization behind the scenes? If so, this should probably be also documented in the comments. Edit: asciiBase definitely isn't a small string.

I don't understand what made you persist with the different names, instead of going with ASCII and Mixed only.

@lorentey
Copy link
Member Author

lorentey commented Dec 3, 2018

It's just that repeating a measurement 100 times is not the same as making a single measurement with a 100-times longer input.

  • Breadcrumbs are separately allocated metadata whose size is proportional to the size of the string they were generated from. Performance may depend on size by accident.
  • The random number generator will produce a different set of indices if you change the size of the input string.
  • etc.

@lorentey lorentey deleted the breadcrumb-benchmarks branch December 3, 2018 21:22
PopFlamingo pushed a commit to PopFlamingo/swift that referenced this pull request Dec 6, 2018
[benchmark] Add some benchmarks for String breadcrumbs
@palimondo
Copy link
Contributor

🧐longMixed look great, but longASCII is weirding me out… You said it looked fine locally, so I don't know how to explain the variance...

Upon further thought, the variance can be explained this way: run_smoke_bench initially gathers only 3 samples per benchmark and that's all we get for the added benchmarks, therefore any benchmarks that have any warmup will manifest itself as this variance.
@eeckstein: We should gather more independent samples also for the added benchmarks.


@lorentey @milseman There's something strange going on with these benchmarks and/or the breadcrumbs implementation.

After Karoy ran the Benchmark_Driver check locally he sized workloads and iterations of the benchmarks to sit nicely in the hundreds of μs range, see StringBreadcrumbs benchmark report above. Then there were two force-pushes that are lost to history, which probably re-adjusted the workloads of CopyUTF16CodeUnits, because in the last benchmark report above, their runtime dropped by a factor of 10. (We should IMO revert this by increasing the count value back to 5_000.)

TEST MIN MAX MEAN
Breadcrumbs.CopyUTF16CodeUnits.ASCII 47 48 48
Breadcrumbs.CopyUTF16CodeUnits.Mixed 52 55 53

Strangely, as soon as 3rd of December some of these start to sporadically appear in the -O Performance changes reported by @swift-ci start with very low runtimes:

TEST OLD NEW DELTA RATIO
Breadcrumbs.CopyUTF16CodeUnits.ASCII 20 18 -10.0% 1.11x
Breadcrumbs.CopyUTF16CodeUnits.Mixed 68 55 -19.1% 1.24x

There were no benchmark reports from @swift-ci in any PR's here on GitHub that would include the improvement as intentional change… I have noticed that something is strange here during my janitor duty, as even on my measly machine the MutatedIdxToUTF16.ASCII and MutatedUTF16ToIdx.ASCII dropped to 15 μs and were picked up by the BenchmarkDoctor heuristic that flags runtimes below 20 μs (because than even 1 μs change is above the 5% threshold increasing the chances it shows up in report).

I've looked at it in Instruments:

ASCII workload's stack trace:

   2 Benchmark_O 289.0  MutatedIdxToUTF16.run(iterations:)
   1 libswiftCore.dylib 35.0  StringProtocol._toUTF16Offset(_:)
   0 libswiftCore.dylib 12.0  specialized String.UTF16View._nativeGetOffset(for:)

Mixed workload's stack trace:

  10 Benchmark_O 16815.0  MutatedIdxToUTF16.run(iterations:)
   9 libswiftCore.dylib 15682.0  StringProtocol._toUTF16Offset(_:)
   8 libswiftCore.dylib 15660.0  specialized String.UTF16View._nativeGetOffset(for:)
   7 libswiftCore.dylib 15342.0  specialized _StringGuts.populateBreadcrumbs(_:)
   6 libswiftCore.dylib 15012.0  specialized _StringBreadcrumbs.init(_:)

It looks like ASCII workload is not using any breadcrumbs. Is that normal or a bug?!

@milseman
Copy link
Member

ASCII <-> UTF-16 can be calculated in constant time. It shouldn't use breadcrumbs. We can either have a benchmark demonstrating the triviality of this workload, or maybe a SIL-level test as I think the ASCII check is in inlinable code.

@palimondo
Copy link
Contributor

I don't understand why the performance demonstrated in the ASCII benchmarks changed between when @lorentey wrote them as seen in the last benchmark report above and later, given that StringBreadcrumbs.swift wasn't modified during that time. And as I said before, there are no publicly visible benchmark reports that would show this improvement was intentional effect of some PR.

Can you please check in your internal performance tracking when these changed? Let's say the MutatedIdxToUTF16.ASCII and MutatedUTF16ToIdx.ASCII in particular... cc @eeckstein

@milseman
Copy link
Member

#20848

@palimondo
Copy link
Contributor

Oh! I have apparently searched through pulls without the proper filters... Thank you!

@palimondo
Copy link
Contributor

We can either have a benchmark demonstrating the triviality of this workload, or maybe a SIL-level test as I think the ASCII check is in inlinable code.

I guess this is fine as is — I'll keep an eye out to see if the sub 20 μs runtime will make these benchmarks show up as false changes. I'm sorry to have bothered you with this. And thanks again for digging up the relevant PR, @milseman!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants