Description
Description
I have created a small benchmark with swift-collections-benchmark to measure the performance of retrieving the half-open ranges of all lines in a string. Each line in the string contained about 15 ASCII characters on average (String encoded as utf8). Some lines were empty (only a single newline character). The string itself had about 2 million lines in total, however the benchmark was only retrieving up to 1 million lines in total. I was measuring the performance difference of the lineRange method when used on a String
and Substring
input.
When using lineRange
on a String
input it takes a little over 200ms to retrieve all ranges for 1 million lines of text.
When using lineRange
on a Substring
however it takes over 1 second to retrieve all ranges for well under 64k lines of text.
Expected behavior
I expected the performance characteristics of lineRange
to be similar to other StringProtocol
methods such as split
(before
is String
- after
is Substring
. Please forgive the confusing colour swap)
OR enumerateLines
(same labels as before)
Steps to reproduce
I can not share the input file used in the tests as it was proprietary 8-bit assembly code but such an input file should be trivial to create. The method to find line ranges in the string used within the benchmark looked like this:
extension StringProtocol {
func lineRangeBenchmark() -> [Range<Self.Index>] {
var output: [Range<Self.Index>] = []
var startIndex = self.startIndex
repeat {
let range = self.lineRange(for: startIndex..<startIndex)
output.append(range)
startIndex = range.upperBound
} while startIndex < self.endIndex
return output
}
}
The benchmark itself was set up like this:
var benchmark = Benchmark(title: "LineRanges")
benchmark.registerInputGenerator(for: Substring.self) { count in
let ranges = input.split(omittingEmptySubsequences: false, whereSeparator: \.isNewline)
let index = ranges.index(ranges.startIndex, offsetBy: count + 1)
return input[input.startIndex..<ranges[index].endIndex] // input is 2 million lines. count denotes the number of lines requested
}
benchmark.registerInputGenerator(for: String.self) { count in
let ranges = input.split(omittingEmptySubsequences: false, whereSeparator: \.isNewline)
let index = ranges.index(ranges.startIndex, offsetBy: count + 1)
return String(input[input.startIndex..<ranges[index].endIndex]) // input is 2 million lines. count denotes the number of lines requested
}
benchmark.addSimple(title: "Substring LineRange", input: Substring.self) { i in
blackHole(i.lineRangeBenchmark())
}
benchmark.addSimple(title: "String LineRange", input: String.self) { i in
blackHole(i.lineRangeBenchmark())
}
benchmark.main()
I ran the benchmark via swift run -c release LineRangesBenchmark run --cycles 10 results
and created the graph with swift run -c release LineRangesBenchmark render results --amortized false --linear-time --linear-size chart.png
Verbose output seems to indicate that -c release
sets -O
and whole-module-optimization
.
Environment
My test machine was a 2023 MacMini with M2 Pro Processor running MacOS Ventura 13.2.
% swift --version
swift-driver version: 1.62.15 Apple Swift version 5.7.2 (swiftlang-5.7.2.135.5 clang-1400.0.29.51)
Target: arm64-apple-macosx13.0