Skip to content

Overhaul quantification optimizations #716

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 16 commits into from
Feb 7, 2024
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
better names and comments
  • Loading branch information
milseman committed Dec 15, 2023
commit 3c5dea053795184e9e1b07833c7c68f083a989b3
6 changes: 3 additions & 3 deletions Sources/_StringProcessing/Engine/InstPayload.swift
Original file line number Diff line number Diff line change
Expand Up @@ -381,7 +381,7 @@ struct QuantifyPayload: RawRepresentable {
case asciiBitset = 0
case asciiChar = 1
case any = 2
case builtin = 4
case builtinCC = 4
}

// TODO: figure out how to better organize this...
Expand Down Expand Up @@ -493,7 +493,7 @@ struct QuantifyPayload: RawRepresentable {
+ (model.isInverted ? 1 << 9 : 0)
+ (model.isStrictASCII ? 1 << 10 : 0)
self.rawValue = packedModel
+ QuantifyPayload.packInfoValues(kind, minTrips, maxExtraTrips, .builtin, isScalarSemantics: isScalarSemantics)
+ QuantifyPayload.packInfoValues(kind, minTrips, maxExtraTrips, .builtinCC, isScalarSemantics: isScalarSemantics)
}

var type: PayloadType {
Expand Down Expand Up @@ -539,7 +539,7 @@ struct QuantifyPayload: RawRepresentable {
(self.rawValue & 1) == 1
}

var builtin: _CharacterClassModel.Representation {
var builtinCC: _CharacterClassModel.Representation {
_CharacterClassModel.Representation(rawValue: self.rawValue & 0xFF)!
}
var builtinIsInverted: Bool {
Expand Down
31 changes: 22 additions & 9 deletions Sources/_StringProcessing/Engine/MEQuantify.swift
Original file line number Diff line number Diff line change
Expand Up @@ -92,10 +92,10 @@ extension Processor {
isScalarSemantics: isScalarSemantics)
}

case .builtin:
case .builtinCC:
if isZeroOrMore {
matchResult = input.matchZeroOrMoreBuiltinCC(
payload.builtin,
payload.builtinCC,
at: currentPosition,
limitedBy: end,
produceSavePointRange: produceSavePointRange,
Expand All @@ -104,7 +104,7 @@ extension Processor {
isScalarSemantics: isScalarSemantics)
} else if isOneOrMore {
matchResult = input.matchOneOrMoreBuiltinCC(
payload.builtin,
payload.builtinCC,
at: currentPosition,
limitedBy: end,
produceSavePointRange: produceSavePointRange,
Expand All @@ -113,7 +113,7 @@ extension Processor {
isScalarSemantics: isScalarSemantics)
} else {
matchResult = input.matchQuantifiedBuiltinCC(
payload.builtin,
payload.builtinCC,
at: currentPosition,
limitedBy: end,
minMatches: minMatches,
Expand Down Expand Up @@ -158,6 +158,11 @@ extension String {
) -> Index?
) -> (next: Index, savePointRange: Range<Index>?)? {
var currentPosition = currentPosition

// The range of backtracking positions to try. For zero-or-more, starts
// before any match happens. Always ends before the final match, since
// the final match is what is tried without backtracking. An empty range
// is valid and means a single backtracking position at rangeStart.
var rangeStart = currentPosition
var rangeEnd = currentPosition

Expand All @@ -171,6 +176,12 @@ extension String {
}
numMatches &+= 1
if numMatches == minMatches {
// For this loop iteration, rangeEnd will actually trail rangeStart by
// a single match position. Next iteration, they will be equal
// (empty range denoting a single backtracking point). Note that we
// only ever return a range if we have exceeded `minMatches`; if we
// exactly mach `minMatches` there is no backtracking positions to
// remember.
rangeStart = next
}
rangeEnd = currentPosition
Expand All @@ -183,20 +194,22 @@ extension String {
}

guard produceSavePointRange && numMatches > minMatches else {
// Consumed no input, no point saved
// No backtracking positions to try
return (currentPosition, nil)
}
assert(rangeStart <= rangeEnd)

// NOTE: We can't assert that rangeEnd trails currentPosition by one
// position, because newline-sequence in scalar semantic mode still
// NOTE: We can't assert that rangeEnd trails currentPosition by exactly
// one position, because newline-sequence in scalar semantic mode still
// matches two scalars

return (currentPosition, rangeStart..<rangeEnd)
}

/// NOTE: [Zero|One]OrMore overloads are to specialize the inlined run loop,
/// which has a substantive perf impact (especially for zero-or-more)
// NOTE: [Zero|One]OrMore overloads are to specialize the inlined run loop,
// which has a perf impact. At the time of writing this, 10% for
// zero-or-more and 5% for one-or-more improvement, which could very well
// be much higher if/when the inner match functions are made faster.

fileprivate func matchZeroOrMoreASCIIBitset(
_ asciiBitset: ASCIIBitset,
Expand Down