Skip to content

Odd performance characteristics when using lineRange on Substring #954

Open
@BitShiftNow

Description

@BitShiftNow

Description
I have created a small benchmark with swift-collections-benchmark to measure the performance of retrieving the half-open ranges of all lines in a string. Each line in the string contained about 15 ASCII characters on average (String encoded as utf8). Some lines were empty (only a single newline character). The string itself had about 2 million lines in total, however the benchmark was only retrieving up to 1 million lines in total. I was measuring the performance difference of the lineRange method when used on a String and Substring input.

When using lineRange on a String input it takes a little over 200ms to retrieve all ranges for 1 million lines of text.
When using lineRange on a Substring however it takes over 1 second to retrieve all ranges for well under 64k lines of text.

isolated

Expected behavior
I expected the performance characteristics of lineRange to be similar to other StringProtocol methods such as split (before is String - after is Substring. Please forgive the confusing colour swap)

05 Split

OR enumerateLines (same labels as before)

06 EnumerateLines

Steps to reproduce
I can not share the input file used in the tests as it was proprietary 8-bit assembly code but such an input file should be trivial to create. The method to find line ranges in the string used within the benchmark looked like this:

extension StringProtocol {
    func lineRangeBenchmark() -> [Range<Self.Index>] {
        var output: [Range<Self.Index>] = []

        var startIndex = self.startIndex

        repeat {
            let range = self.lineRange(for: startIndex..<startIndex)
            output.append(range)
            startIndex = range.upperBound
        } while startIndex < self.endIndex
        
        return output
    }
}

The benchmark itself was set up like this:

var benchmark = Benchmark(title: "LineRanges")

benchmark.registerInputGenerator(for: Substring.self) { count in
    let ranges = input.split(omittingEmptySubsequences: false, whereSeparator: \.isNewline)
    let index = ranges.index(ranges.startIndex, offsetBy: count + 1)
    return input[input.startIndex..<ranges[index].endIndex] // input is 2 million lines. count denotes the number of lines requested
}
        
benchmark.registerInputGenerator(for: String.self) { count in
    let ranges = input.split(omittingEmptySubsequences: false, whereSeparator: \.isNewline)
    let index = ranges.index(ranges.startIndex, offsetBy: count + 1)
    return String(input[input.startIndex..<ranges[index].endIndex]) // input is 2 million lines. count denotes the number of lines requested
}

benchmark.addSimple(title: "Substring LineRange", input: Substring.self) { i in
    blackHole(i.lineRangeBenchmark())
}
        
benchmark.addSimple(title: "String LineRange", input: String.self) { i in
    blackHole(i.lineRangeBenchmark())
}

benchmark.main()

I ran the benchmark via swift run -c release LineRangesBenchmark run --cycles 10 results
and created the graph with swift run -c release LineRangesBenchmark render results --amortized false --linear-time --linear-size chart.png

Verbose output seems to indicate that -c release sets -O and whole-module-optimization.

Environment
My test machine was a 2023 MacMini with M2 Pro Processor running MacOS Ventura 13.2.

% swift --version
swift-driver version: 1.62.15 Apple Swift version 5.7.2 (swiftlang-5.7.2.135.5 clang-1400.0.29.51)
Target: arm64-apple-macosx13.0

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions