[benchmark] Add some benchmarks for String breadcrumbs #20769

lorentey · 2018-11-26T20:35:06Z

This adds some benchmarks for String breadcrumbing, via low-level stdlib entry points exposed by String.

StringBreadcrumbs.UTF16ToIndex: Convert some random UTF-16 offsets into String indices.
StringBreadcrumbs.IndexToUTF16: Convert some random String indices into UTF-16 offsets.
StringBreadcrumbs.UTF16ToIndexRange: Split a string into a bunch of random slices and convert their UTF-16 offsets into String index ranges.
StringBreadcrumbs.IndexToUTF16Range: Do the same in reverse direction.
StringBreadcrumbs.CopyUTF16CodeUnits: Copy UTF-16 code units from random ranges.
StringBreadcrumbs.MutatedUTF16ToIndex: This is like UTF16ToIndex but appends to the string after every index conversion. In effect, this tests breadcrumb creation performance.
StringBreadcrumbs.MutatedIndexToUTF16: This is like IndexToUTF16 but appends to the string after every index conversion. In effect, this tests breadcrumb creation performance.

https://bugs.swift.org/browse/SR-9226

lorentey · 2018-11-26T20:36:02Z

The CopyUTF16CodeUnits benchmark here will crash until #20768 lands.

palimondo · 2018-11-26T21:05:43Z

@lorentey #20667 has landed. You should be able to run these now with swift-ci, too.

lorentey · 2018-11-26T21:07:39Z

#20768
@swift-ci smoke test

lorentey · 2018-11-26T21:08:03Z

#20768
@swift-ci benchmark

benchmark/single-source/StringBreadcrumbs.swift

palimondo · 2018-11-26T21:24:22Z

I haven't yet finished integration of BenchmarkDoctor into run_smoke_bench… could you run this locally, please?

$ ./Benchmark_Driver check -f StringBreadcrumbs

(ignoring the naming complaints it will spew, as we have a hung jury back in #20334 😜)

milseman · 2018-11-26T21:25:12Z

Did you profile this and make sure all the time is spent in the corresponding methods on String? I just want to make sure there isn't any ARC or missed devirtualization opportunities that would change what we're measuring.

lorentey · 2018-11-26T21:34:18Z

@palimondo The doctor complained about the names and the "very wide range of memory used", but I scaled things so that it likes the timings.

lorentey · 2018-11-26T21:44:27Z

@milseman I did some spot checks -- the heavy stack frames were inside the target entry points and looked breadcrumby enough. I wasn't looking for missed optimization opportunities though.

I'll have a more detailed look tomorrow, just to make sure these are all measuring what I think they're measuring.

palimondo · 2018-11-26T21:46:34Z

The doctor complained about the names and the "very wide range of memory used"

🧐Interesting. Could you re-check those with --verbose flag, to see how bad it is?
Tip: You can work with just some select benchmarks by their test numbers, which can come in handy if you want to focus on sub-ranges, like this:

$ ./Benchmark_Driver check {204..217} {670..683}

lorentey · 2018-11-26T21:51:14Z

@palimondo Here's a quick sample of the memory-related output:

memory: 'StringBreadcrumbs.CopyUTF16CodeUnits.shortASCII' has very wide range of memory used between independent, repeated measurements.
memory: StringBreadcrumbs.CopyUTF16CodeUnits.shortASCII mem_pages [i1, i2]:  min=[42, 46] 𝚫=4 R=[4, 17]
memory: 'StringBreadcrumbs.IndexToUTF16.longASCII' has very wide range of memory used between independent, repeated measurements.
memory: StringBreadcrumbs.IndexToUTF16.longASCII mem_pages [i1, i2]:  min=[1108, 1119] 𝚫=11 R=[44, 31]
memory: 'StringBreadcrumbs.IndexToUTF16.longMixed' has very wide range of memory used between independent, repeated measurements.
memory: StringBreadcrumbs.IndexToUTF16.longMixed mem_pages [i1, i2]:  min=[323, 326] 𝚫=3 R=[36, 36]
memory: 'StringBreadcrumbs.IndexToUTF16Range.longASCII' has very wide range of memory used between independent, repeated measurements.
memory: StringBreadcrumbs.IndexToUTF16Range.longASCII mem_pages [i1, i2]:  min=[1116, 1125] 𝚫=9 R=[33, 29]

benchmark/single-source/StringBreadcrumbs.swift

swift-ci · 2018-11-26T22:41:00Z

Build comment file:

benchmark/single-source/StringBreadcrumbs.swift

milseman · 2018-11-26T23:10:02Z

@swift-ci please smoke test os x platform

milseman · 2018-11-26T23:10:12Z

@swift-ci please benchmark

palimondo

This is very fresh approach to writing benchmarks! 👍

So far I have these comments:

Documentation block at the beginning that explains what are the Breadcrumbs being tested here, would be very helpful.
The test names would ideally fit under 40 characters, renames suggested here.
The debug print should be removed.

benchmark/single-source/StringBreadcrumbs.swift

milseman · 2018-11-27T00:14:27Z

@palimondo where is the 40-character limit coming from and is there any way to get more? Is it Github's in-PR renderer that's constraining this?

swift-ci · 2018-11-27T00:15:24Z

Build comment file:

Performance: -O

TEST	MIN	MAX	MEAN	MAX_RSS
Added
StringBreadcrumbs.CopyUTF16CodeUnits.shortASCII	663	700	676	—
StringBreadcrumbs.CopyUTF16CodeUnits.shortMixed	672	679	675	—
StringBreadcrumbs.IndexToUTF16.longASCII	660	2211	1177	—
StringBreadcrumbs.IndexToUTF16.longMixed	791	1312	965	—
StringBreadcrumbs.IndexToUTF16Range.longASCII	306	1786	800	—
StringBreadcrumbs.IndexToUTF16Range.longMixed	387	390	388	—
StringBreadcrumbs.MutatedIndexToUTF16.shortASCII	794	839	817	—
StringBreadcrumbs.MutatedIndexToUTF16.shortMixed	300	423	361	—
StringBreadcrumbs.MutatedUTF16ToIndex.shortASCII	791	837	814	—
StringBreadcrumbs.MutatedUTF16ToIndex.shortMixed	306	425	365	—
StringBreadcrumbs.UTF16ToIndex.longASCII	380	380	380	—
StringBreadcrumbs.UTF16ToIndex.longMixed	427	429	428	—
StringBreadcrumbs.UTF16ToIndexRange.longASCII	99	99	99	—
StringBreadcrumbs.UTF16ToIndexRange.longMixed	122	122	122	—

Performance: -Osize

TEST	MIN	MAX	MEAN	MAX_RSS
Added
StringBreadcrumbs.CopyUTF16CodeUnits.shortASCII	680	708	690	—
StringBreadcrumbs.CopyUTF16CodeUnits.shortMixed	688	704	698	—
StringBreadcrumbs.IndexToUTF16.longASCII	654	2141	1150	—
StringBreadcrumbs.IndexToUTF16.longMixed	784	1298	956	—
StringBreadcrumbs.IndexToUTF16Range.longASCII	308	1774	797	—
StringBreadcrumbs.IndexToUTF16Range.longMixed	391	394	392	—
StringBreadcrumbs.MutatedIndexToUTF16.shortASCII	791	838	815	—
StringBreadcrumbs.MutatedIndexToUTF16.shortMixed	300	417	359	—
StringBreadcrumbs.MutatedUTF16ToIndex.shortASCII	787	834	811	—
StringBreadcrumbs.MutatedUTF16ToIndex.shortMixed	304	424	364	—
StringBreadcrumbs.UTF16ToIndex.longASCII	372	373	372	—
StringBreadcrumbs.UTF16ToIndex.longMixed	423	440	429	—
StringBreadcrumbs.UTF16ToIndexRange.longASCII	95	97	96	—
StringBreadcrumbs.UTF16ToIndexRange.longMixed	119	120	119	—

Performance: -Onone

TEST	OLD	NEW	DELTA	RATIO
Regression
ObjectiveCBridgeFromNSArrayAnyObjectForced	8998	10040	+11.6%	0.90x
Added
StringBreadcrumbs.CopyUTF16CodeUnits.shortASCII	1380	1455	1417	—
StringBreadcrumbs.CopyUTF16CodeUnits.shortMixed	1340	1371	1360	—
StringBreadcrumbs.IndexToUTF16.longASCII	1294	2818	1803	—
StringBreadcrumbs.IndexToUTF16.longMixed	1420	1948	1597	—
StringBreadcrumbs.IndexToUTF16Range.longASCII	606	2103	1105	—
StringBreadcrumbs.IndexToUTF16Range.longMixed	690	697	693	—
StringBreadcrumbs.MutatedIndexToUTF16.shortASCII	802	847	824	—
StringBreadcrumbs.MutatedIndexToUTF16.shortMixed	308	426	367	—
StringBreadcrumbs.MutatedUTF16ToIndex.shortASCII	809	858	833	—
StringBreadcrumbs.MutatedUTF16ToIndex.shortMixed	322	444	383	—
StringBreadcrumbs.UTF16ToIndex.longASCII	1503	1513	1507	—
StringBreadcrumbs.UTF16ToIndex.longMixed	1552	1569	1561	—
StringBreadcrumbs.UTF16ToIndexRange.longASCII	341	342	341	—
StringBreadcrumbs.UTF16ToIndexRange.longMixed	367	368	368	—

How to read the data

The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview

  Model Name: Mac Pro
  Model Identifier: MacPro6,1
  Processor Name: 12-Core Intel Xeon E5
  Processor Speed: 2.7 GHz
  Number of Processors: 1
  Total Number of Cores: 12
  L2 Cache (per Core): 256 KB
  L3 Cache: 30 MB
  Memory: 64 GB

--------------

palimondo · 2018-11-27T00:33:46Z

@milseman

where is the 40-character limit coming from and is there any way to get more? Is it Github's in-PR renderer that's constraining this?

Yes, primarily GitHub's fixed table width. As the above report shows, these name at 48 chars are filling the entire available space. But this is a best case scenario, as the -O runtimes are under 1000 μs . When you combine this with longer runtimes and all the benchmarks, it triggers horizontal scrolling.

Also formatting the result table in console (Benchmark_Driver's output as run through build_script). And usability of all the benchmark reports on mobile devices, see #20334 (comment).

palimondo · 2018-11-27T00:53:37Z

The variance in IndexToUTF16 and IndexToUTF16Range groups worries me:

TEST	MIN	MAX	MEAN	MAX_RSS
StringBreadcrumbs.IndexToUTF16.longASCII	660	2211	1177	—
StringBreadcrumbs.IndexToUTF16.longMixed	791	1312	965	—
StringBreadcrumbs.IndexToUTF16Range.longASCII	306	1786	800	—
StringBreadcrumbs.IndexToUTF16Range.longMixed	387	390	388	—

Only the last one looks fine. This might be background process interference, with run_smoke_bench. It still uses averaging… instead of going with --num-iters=1. But it's present in both -O and -Osize builds, so it looks quite 🐟y.

@lorentey Could you please check locally the variance isn't this crazy, when measured properly?

$ ./Benchmark_Driver run -i 3 -f StringBreadcrumbs.IndexToUTF16

🤨 These are the same tests that reported very wide range of memory used. There's some issue with the benchmarks or the tested functionality.

lorentey · 2018-11-28T19:16:28Z

IdxToUTF16 looks fine to me locally:

$ .../Benchmark_Driver run -i 3 -f Breadcrumbs.IdxToUTF16
#,TEST,SAMPLES,MIN(μs),Q1(μs),MEDIAN(μs),Q3(μs),MAX(μs),MAX_RSS(B)
45,Breadcrumbs.IdxToUTF16.longASCII,55,421,433,434,435,452,4575232
46,Breadcrumbs.IdxToUTF16.longMixed,58,555,572,574,577,600,1310720
47,Breadcrumbs.IdxToUTF16Range.longASCII,57,202,209,209,210,214,4575232
48,Breadcrumbs.IdxToUTF16Range.longMixed,58,290,299,299,301,311,1327104

Total performance tests executed: 4

lorentey · 2018-11-28T19:17:36Z

Rebased, applied review notes, cleaned up some things.

@swift-ci please smoke test

lorentey · 2018-11-28T19:19:04Z

@swift-ci benchmark

lorentey · 2018-11-28T19:21:31Z

Ah, I forgot to push a comment typo fix. Let's wait out the benchmark run first.

lorentey · 2018-11-28T19:31:31Z

@milseman I profiled all tests looking for anomalies; they seem to work as intended. I did not notice anything left unspecialized, but it seems noteworthy that the IdxToUTF16 and IdxToUTF16Range tests spent about half of their time in BidirectionalCollection's index navigation, and only a third in getBreadcrumb(forIndex:). Some internal heuristic may not work right for these tests.

benchmark/single-source/Breadcrumbs.swift

palimondo

LGTM. Few cosmetic nits in the comments. Thank's for hearing me out, Karoy!

lorentey · 2018-11-28T19:53:34Z

Please test with the following PR:

#20807
@swift-ci please benchmark

lorentey · 2018-11-28T19:54:53Z

@swift-ci please smoke check

benchmark/single-source/Breadcrumbs.swift

palimondo · 2018-11-28T20:07:47Z

@lorentey BTW, brace for BenchmarkDoctor complaining about names not being in UpperCamelCase, since the new naming convention isn't in yet… could you maybe help me finish and merge that in the meantime?

swift-ci · 2018-11-28T20:38:11Z

Build comment file:

Performance: -O

TEST	OLD	NEW	DELTA	RATIO
Improvement
CharacterLiteralsLarge	108	97	-10.2%	1.11x
CharacterLiteralsSmall	348	325	-6.6%	1.07x
Added
Breadcrumbs.CopyUTF16CodeUnits.ASCII	47	48	48	—
Breadcrumbs.CopyUTF16CodeUnits.Mixed	62	63	62	—
Breadcrumbs.IdxToUTF16.longASCII	586	2077	1083	—
Breadcrumbs.IdxToUTF16.longMixed	714	724	720	—
Breadcrumbs.IdxToUTF16Range.longASCII	289	1756	778	—
Breadcrumbs.IdxToUTF16Range.longMixed	371	373	372	—
Breadcrumbs.MutatedIdxToUTF16.ASCII	758	759	759	—
Breadcrumbs.MutatedIdxToUTF16.Mixed	257	258	258	—
Breadcrumbs.MutatedUTF16ToIdx.ASCII	755	756	755	—
Breadcrumbs.MutatedUTF16ToIdx.Mixed	260	262	261	—
Breadcrumbs.UTF16ToIdx.longASCII	310	311	311	—
Breadcrumbs.UTF16ToIdx.longMixed	362	363	363	—
Breadcrumbs.UTF16ToIdxRange.longASCII	84	86	85	—
Breadcrumbs.UTF16ToIdxRange.longMixed	106	107	107	—

Performance: -Osize

TEST	OLD	NEW	DELTA	RATIO
Regression
DataCountSmall	34	37	+8.8%	0.92x
DataCountMedium	37	40	+8.1%	0.93x (?)
Added
Breadcrumbs.CopyUTF16CodeUnits.ASCII	47	48	48	—
Breadcrumbs.CopyUTF16CodeUnits.Mixed	52	55	53	—
Breadcrumbs.IdxToUTF16.longASCII	584	2118	1095	—
Breadcrumbs.IdxToUTF16.longMixed	720	726	722	—
Breadcrumbs.IdxToUTF16Range.longASCII	291	1801	795	—
Breadcrumbs.IdxToUTF16Range.longMixed	374	376	375	—
Breadcrumbs.MutatedIdxToUTF16.ASCII	760	761	761	—
Breadcrumbs.MutatedIdxToUTF16.Mixed	257	272	262	—
Breadcrumbs.MutatedUTF16ToIdx.ASCII	756	757	756	—
Breadcrumbs.MutatedUTF16ToIdx.Mixed	259	262	260	—
Breadcrumbs.UTF16ToIdx.longASCII	310	310	310	—
Breadcrumbs.UTF16ToIdx.longMixed	360	360	360	—
Breadcrumbs.UTF16ToIdxRange.longASCII	84	85	84	—
Breadcrumbs.UTF16ToIdxRange.longMixed	106	108	107	—

Performance: -Onone

TEST	OLD	NEW	DELTA	RATIO
Regression
ExclusivityIndependent	74	88	+18.9%	0.84x
Added
Breadcrumbs.CopyUTF16CodeUnits.ASCII	115	121	117	—
Breadcrumbs.CopyUTF16CodeUnits.Mixed	118	119	118	—
Breadcrumbs.IdxToUTF16.longASCII	1187	2742	1706	—
Breadcrumbs.IdxToUTF16.longMixed	1324	1818	1490	—
Breadcrumbs.IdxToUTF16Range.longASCII	426	1907	920	—
Breadcrumbs.IdxToUTF16Range.longMixed	509	512	510	—
Breadcrumbs.MutatedIdxToUTF16.ASCII	768	769	769	—
Breadcrumbs.MutatedIdxToUTF16.Mixed	266	267	267	—
Breadcrumbs.MutatedUTF16ToIdx.ASCII	766	767	766	—
Breadcrumbs.MutatedUTF16ToIdx.Mixed	270	275	272	—
Breadcrumbs.UTF16ToIdx.longASCII	905	908	906	—
Breadcrumbs.UTF16ToIdx.longMixed	944	1025	971	—
Breadcrumbs.UTF16ToIdxRange.longASCII	213	213	213	—
Breadcrumbs.UTF16ToIdxRange.longMixed	236	237	237	—

How to read the data

The tables contain differences in performance which are larger than 8% and differences in code size which are larger than 1%.

If you see any unexpected regressions, you should consider fixing the
regressions before you merge the PR.

Noise: Sometimes the performance results (not code size!) contain false
alarms. Unexpected regressions which are marked with '(?)' are probably noise.
If you see regressions which you cannot explain you can try to run the
benchmarks again. If regressions still show up, please consult with the
performance team (@eeckstein).

Hardware Overview

  Model Name: Mac Pro
  Model Identifier: MacPro6,1
  Processor Name: 12-Core Intel Xeon E5
  Processor Speed: 2.7 GHz
  Number of Processors: 1
  Total Number of Cores: 12
  L2 Cache (per Core): 256 KB
  L3 Cache: 30 MB
  Memory: 64 GB

--------------

palimondo · 2018-11-28T21:04:29Z

@lorentey It looks like it didn't run the benchmarks with the Benchmark Check Report from other PR. I guess this only works for test? // cc @eeckstein

palimondo · 2018-11-28T21:17:45Z

Benchmark	MIN	MAX	MEAN
Breadcrumbs.IdxToUTF16.longASCII	586	2077	1083
Breadcrumbs.IdxToUTF16.longMixed	714	724	720
Breadcrumbs.IdxToUTF16Range.longASCII	289	1756	778
Breadcrumbs.IdxToUTF16Range.longMixed	371	373	372

🧐longMixed look great, but longASCII is weirding me out… You said it looked fine locally, so I don't know how to explain the variance...

milseman · 2018-11-28T23:17:58Z

@swift-ci please smoke test

palimondo · 2018-12-03T16:12:51Z

@lorentey, you didn't answer my question from older conversation:

longASCII and shortASCII are quite different workloads though.

How so? I only see the same string repeated multiple times. That seems to me just like a different way to scale the workload size, instead of setting the inner-loop multiplier. Am I mistaken? ~~Is the difference about some kind of small string optimization behind the scenes? If so, this should probably be also documented in the comments.~~ Edit: asciiBase definitely isn't a small string.

I don't understand what made you persist with the different names, instead of going with ASCII and Mixed only.

lorentey · 2018-12-03T21:22:38Z

It's just that repeating a measurement 100 times is not the same as making a single measurement with a 100-times longer input.

Breadcrumbs are separately allocated metadata whose size is proportional to the size of the string they were generated from. Performance may depend on size by accident.
The random number generator will produce a different set of indices if you change the size of the input string.
etc.

[benchmark] Add some benchmarks for String breadcrumbs

palimondo · 2019-02-14T15:35:50Z

🧐longMixed look great, but longASCII is weirding me out… You said it looked fine locally, so I don't know how to explain the variance...

Upon further thought, the variance can be explained this way: run_smoke_bench initially gathers only 3 samples per benchmark and that's all we get for the added benchmarks, therefore any benchmarks that have any warmup will manifest itself as this variance.
@eeckstein: We should gather more independent samples also for the added benchmarks.

@lorentey @milseman There's something strange going on with these benchmarks and/or the breadcrumbs implementation.

After Karoy ran the Benchmark_Driver check locally he sized workloads and iterations of the benchmarks to sit nicely in the hundreds of μs range, see StringBreadcrumbs benchmark report above. Then there were two force-pushes that are lost to history, which probably re-adjusted the workloads of CopyUTF16CodeUnits, because in the last benchmark report above, their runtime dropped by a factor of 10. (We should IMO revert this by increasing the count value back to 5_000.)

TEST	MIN	MAX	MEAN
Breadcrumbs.CopyUTF16CodeUnits.ASCII	47	48	48
Breadcrumbs.CopyUTF16CodeUnits.Mixed	52	55	53

Strangely, as soon as 3rd of December some of these start to sporadically appear in the -O Performance changes reported by @swift-ci start with very low runtimes:

TEST	OLD	NEW	DELTA	RATIO
Breadcrumbs.CopyUTF16CodeUnits.ASCII	20	18	-10.0%	1.11x
Breadcrumbs.CopyUTF16CodeUnits.Mixed	68	55	-19.1%	1.24x

There were no benchmark reports from @swift-ci in any PR's here on GitHub that would include the improvement as intentional change… I have noticed that something is strange here during my janitor duty, as even on my measly machine the MutatedIdxToUTF16.ASCII and MutatedUTF16ToIdx.ASCII dropped to 15 μs and were picked up by the BenchmarkDoctor heuristic that flags runtimes below 20 μs (because than even 1 μs change is above the 5% threshold increasing the chances it shows up in report).

I've looked at it in Instruments:

ASCII workload's stack trace:

   2 Benchmark_O 289.0  MutatedIdxToUTF16.run(iterations:)
   1 libswiftCore.dylib 35.0  StringProtocol._toUTF16Offset(_:)
   0 libswiftCore.dylib 12.0  specialized String.UTF16View._nativeGetOffset(for:)

Mixed workload's stack trace:

  10 Benchmark_O 16815.0  MutatedIdxToUTF16.run(iterations:)
   9 libswiftCore.dylib 15682.0  StringProtocol._toUTF16Offset(_:)
   8 libswiftCore.dylib 15660.0  specialized String.UTF16View._nativeGetOffset(for:)
   7 libswiftCore.dylib 15342.0  specialized _StringGuts.populateBreadcrumbs(_:)
   6 libswiftCore.dylib 15012.0  specialized _StringBreadcrumbs.init(_:)

It looks like ASCII workload is not using any breadcrumbs. Is that normal or a bug?!

milseman · 2019-02-14T22:27:01Z

ASCII <-> UTF-16 can be calculated in constant time. It shouldn't use breadcrumbs. We can either have a benchmark demonstrating the triviality of this workload, or maybe a SIL-level test as I think the ASCII check is in inlinable code.

palimondo · 2019-02-15T08:44:17Z

I don't understand why the performance demonstrated in the ASCII benchmarks changed between when @lorentey wrote them as seen in the last benchmark report above and later, given that StringBreadcrumbs.swift wasn't modified during that time. And as I said before, there are no publicly visible benchmark reports that would show this improvement was intentional effect of some PR.

Can you please check in your internal performance tracking when these changed? Let's say the MutatedIdxToUTF16.ASCII and MutatedUTF16ToIdx.ASCII in particular... cc @eeckstein

milseman · 2019-02-15T17:39:30Z

#20848

palimondo · 2019-02-15T17:48:22Z

Oh! I have apparently searched through pulls without the proper filters... Thank you!

palimondo · 2019-02-15T22:25:02Z

We can either have a benchmark demonstrating the triviality of this workload, or maybe a SIL-level test as I think the ASCII check is in inlinable code.

I guess this is fine as is — I'll keep an eye out to see if the sub 20 μs runtime will make these benchmarks show up as false changes. I'm sorry to have bothered you with this. And thanks again for digging up the relevant PR, @milseman!

lorentey requested a review from milseman November 26, 2018 20:35

palimondo reviewed Nov 26, 2018

View reviewed changes