Skip to content

[Integration] main (4d04019) -> swift/main #442

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 46 commits into from
May 27, 2022
Merged
Changes from 1 commit
Commits
Show all changes
46 commits
Select commit Hold shift + click to select a range
3f54941
Implement .as for Regex
Azoy May 3, 2022
7e1ab7d
Unify Match and AnyRegexOutput
Azoy May 3, 2022
bc51e91
Ban numeric escapes in custom character classes
hamishknight May 10, 2022
a4a4a9a
Ban confusable multi-scalar ASCII characters
hamishknight May 10, 2022
db58c1b
Reserve `<{...}>` for interpolation syntax
hamishknight May 10, 2022
a53a40b
Remove the namedCaptureOffset and StructuredCapture
Azoy May 10, 2022
87ea119
Disable resilience on _RegexParser (#397)
rxwei May 11, 2022
baf9f22
Introduce `One`
rxwei May 12, 2022
d9d02c1
Ban `]` as literal first character of custom character class
hamishknight May 12, 2022
d3ea692
Merge pull request #404 from hamishknight/ban-empty-cc
hamishknight May 12, 2022
21f7910
Subsume referencedCaptureOffsets
Azoy May 12, 2022
c7b70a4
Add optional tests
Azoy May 12, 2022
b8178c2
Merge pull request #403 from rxwei/1
rxwei May 12, 2022
9d86c21
Wrap character classes around One
Azoy May 12, 2022
24c139a
fix intersection, subtraction, symmetricDiference
Azoy May 12, 2022
489c63c
Merge pull request #410 from Azoy/more-patternconverter-updates
Azoy May 13, 2022
9cf3cfc
Merge pull request #393 from hamishknight/stricter-syntax
hamishknight May 13, 2022
adf5688
Don't get stuck on empty matches (#415)
natecook1000 May 15, 2022
4f1e0ee
Underscore internal algorithms methods (#414)
natecook1000 May 15, 2022
4f8f67a
Remove the last SPI use of _RegexParser symbols (#416)
natecook1000 May 15, 2022
a4d7be0
Keep track of initial options in compiled program (#412)
natecook1000 May 16, 2022
c000596
More unicode properties (#385)
natecook1000 May 16, 2022
812c394
Keep substring bounds when searching in Regex.wholeMatch
natecook1000 May 17, 2022
ba33c0d
Merge pull request #421 from natecook1000/fix_wholematch_substring
natecook1000 May 17, 2022
7969272
Merge pull request #376 from Azoy/types-types-and-more-types
Azoy May 18, 2022
74f3b99
Add test fixtures for renderAsBuilderDSL (#423)
natecook1000 May 19, 2022
88dc9dd
Fix algorithms overload resolution issues (#402)
natecook1000 May 19, 2022
06dbc16
Introduce Source.lookahead
hamishknight May 24, 2022
8242df6
Remove `throws` from a couple of lexing methods
hamishknight May 24, 2022
e80322b
Add ASTBuilder helper for char class set operations
hamishknight May 24, 2022
1e57c5a
Simplify character class parsing a little
hamishknight May 24, 2022
95dc487
Dump the inverted bit of a custom character class
hamishknight May 24, 2022
9d84967
Allow empty comments
hamishknight May 24, 2022
24b64cd
Lex whitespace in range quantifiers
hamishknight May 24, 2022
8388d0f
Parse end-of-line comments in custom character classes
hamishknight May 24, 2022
5b0524a
Allow trivia between character class range operands
hamishknight May 24, 2022
bd9bf23
Merge pull request #431 from hamishknight/trivia-pursuit
hamishknight May 25, 2022
720ddd2
Implement named backreferences
hamishknight May 25, 2022
4b7d534
Remove `namedCaptureOffsets` from MECaptureList
hamishknight May 25, 2022
471e073
Merge pull request #433 from hamishknight/named-refs
hamishknight May 25, 2022
5495a75
Make `RegexCompilationError` internal
rxwei May 26, 2022
a936e9e
Merge pull request #438 from rxwei/internal-regex-compilation-error
rxwei May 26, 2022
f1b8581
Formalize Unicode block properties
hamishknight May 26, 2022
05f73db
Parse Java character properties
hamishknight May 26, 2022
4d04019
Merge pull request #440 from hamishknight/chunk-loader
hamishknight May 27, 2022
6d1d146
Merge branch 'main' into main-merge
hamishknight May 27, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Introduce Source.lookahead
Use this to replace the various places we're doing
`var src = self`.
  • Loading branch information
hamishknight committed May 24, 2022
commit 06dbc16a333d88fc50febfd98ad9c65c8f57be3d
86 changes: 49 additions & 37 deletions Sources/_RegexParser/Regex/Parse/LexicalAnalysis.swift
Original file line number Diff line number Diff line change
Expand Up @@ -149,6 +149,14 @@ extension Source {
return result
}

/// Perform a lookahead using a temporary source. Within the body of the
/// lookahead, any modifications to the source will not be reflected outside
/// the body.
func lookahead<T>(_ body: (inout Source) throws -> T) rethrows -> T {
var src = self
return try body(&src)
}

/// Attempt to eat the given character, returning its source location if
/// successful, `nil` otherwise.
mutating func tryEatWithLoc(_ c: Character) -> SourceLocation? {
Expand Down Expand Up @@ -1240,8 +1248,9 @@ extension Source {

private func canLexPOSIXCharacterProperty() -> Bool {
do {
var src = self
return try src.lexPOSIXCharacterProperty() != nil
return try lookahead { src in
try src.lexPOSIXCharacterProperty() != nil
}
} catch {
// We want to tend on the side of lexing a POSIX character property, so
// even if it is invalid in some way (e.g invalid property names), still
Expand Down Expand Up @@ -1394,10 +1403,11 @@ extension Source {

/// Checks whether a numbered reference can be lexed.
private func canLexNumberedReference() -> Bool {
var src = self
_ = src.tryEat(anyOf: "+", "-")
guard let next = src.peek() else { return false }
return RadixKind.decimal.characterFilter(next)
lookahead { src in
_ = src.tryEat(anyOf: "+", "-")
guard let next = src.peek() else { return false }
return RadixKind.decimal.characterFilter(next)
}
}

/// Eat a named reference up to a given closing delimiter.
Expand Down Expand Up @@ -1587,53 +1597,55 @@ extension Source {

/// Whether we can lex a group-like reference after the specifier '(?'.
private func canLexGroupLikeReference() -> Bool {
var src = self
if src.tryEat("P") {
return src.tryEat(anyOf: "=", ">") != nil
}
if src.tryEat(anyOf: "&", "R") != nil {
return true
lookahead { src in
if src.tryEat("P") {
return src.tryEat(anyOf: "=", ">") != nil
}
if src.tryEat(anyOf: "&", "R") != nil {
return true
}
return src.canLexNumberedReference()
}
return src.canLexNumberedReference()
}

private func canLexMatchingOptionsAsAtom(context: ParsingContext) -> Bool {
var src = self

// See if we can lex a matching option sequence that terminates in ')'. Such
// a sequence is an atom. If an error is thrown, there are invalid elements
// of the matching option sequence. In such a case, we can lex as a group
// and diagnose the invalid group kind.
guard (try? src.lexMatchingOptionSequence(context: context)) != nil else {
return false
lookahead { src in
// See if we can lex a matching option sequence that terminates in ')'.
// Such a sequence is an atom. If an error is thrown, there are invalid
// elements of the matching option sequence. In such a case, we can lex as
// a group and diagnose the invalid group kind.
guard (try? src.lexMatchingOptionSequence(context: context)) != nil else {
return false
}
return src.tryEat(")")
}
return src.tryEat(")")
}

/// Whether a group specifier should be lexed as an atom instead of a group.
private func shouldLexGroupLikeAtom(context: ParsingContext) -> Bool {
var src = self
guard src.tryEat("(") else { return false }
lookahead { src in
guard src.tryEat("(") else { return false }

if src.tryEat("?") {
// The start of a reference '(?P=', '(?R', ...
if src.canLexGroupLikeReference() { return true }
if src.tryEat("?") {
// The start of a reference '(?P=', '(?R', ...
if src.canLexGroupLikeReference() { return true }

// The start of a PCRE callout.
if src.tryEat("C") { return true }
// The start of a PCRE callout.
if src.tryEat("C") { return true }

// The start of an Oniguruma 'of-contents' callout.
if src.tryEat("{") { return true }
// The start of an Oniguruma 'of-contents' callout.
if src.tryEat("{") { return true }

// A matching option atom (?x), (?i), ...
if src.canLexMatchingOptionsAsAtom(context: context) { return true }
// A matching option atom (?x), (?i), ...
if src.canLexMatchingOptionsAsAtom(context: context) { return true }

return false
}
// The start of a backreference directive or Oniguruma named callout.
if src.tryEat("*") { return true }

return false
}
// The start of a backreference directive or Oniguruma named callout.
if src.tryEat("*") { return true }

return false
}

/// Consume an escaped atom, starting from after the backslash
Expand Down