Open
Description
I want to gather up many areas of near-future work that we've been clarifying through the proposal reviews.
Loose categorization:
Language and integration
- Ability to use a
String
-backed, CaseIterable enum as a regex component - Define errors types for compilation and type mismatches
- Callouts from literals
- A Regex-backed enum that will construct a
ChoiceOf
all cases in order
API
- Ability to
map
over a regex, perhaps per-capture, to supply post-processing transforms at regex declaration time - A modifier on a regex to convert it to matches-anywhere semantics
- E.g.
regex.matchingAnywhere => Regex { /.*?/ ; regex ; /.*/ }
. - But we'd preserve the matched range, i.e. reset start/end position
- E.g.
- Character alignment queries
- API for whether start/end is
Character
-aligned for whole match and each capture
- API for whether start/end is
- API to query options (e.g. is this case insensitive?)
- API for
(?n)
, could be nice to strip out captures you don't care about, especially for type erased regexes.- compilation error if there are back-references or it if changes the semantics of the program
Algorithms
- Add a
replace(_:withTemplate:)
method that recognizes$1
or\1
placeholders - A separator-preserving split variant
- Suffix / from-the-end operations (trim etc)
- Customize search
String and Unicode
- Add unsupported Unicode properties to
Unicode.Properties
and support in regexes - Add
Unicode.AllScalars
as a public type (semi-tangential) - Add
var Substring.range: Range<String.Index>
to simplify getting the range of a capture group - Inits for making a NFC string from UTF-8
String.lines()
andString.words()
- Add option for canonical equivalence in scalar-semantic mode
Dynamic Regex API
- Add a capture-description API to all regexes
- some RAC of capture, which has a type and optionality
- Missing match conversions
Regex<T>.Match.init?(_:ARO)
Regex<T>.Match.init?(_:Regex<ARO>.Match)
Builders
- A high-level helper for separated/quoted repetitions, e.g
Repeat(separator: \.whitespace) { ... }
- A helper for repeated matching lookahead and negative lookahead, e.g.
Repeat(while:)
Repeat(whileNot:)
Until(negLookaheadCondition) { ... }
- A
func compile() throws
to explicitly trigger compilation and get errors, such as quantifying the unquantifiable- This is useful when composing regexes together to check the final result instead of trapping at run time.
- Default
Reference
capture type toSubstring.self
Engine
- Engine limiters, low-level backtracking control and timeouts
- Provide a way to access all values of a repeated capture (e.g. subscribe)
- Conditionals
(?(x)...)
(requires updated parsing) - Quoted string inside custom character classes (e.g.
[a-z\q{ch}]
)
Parser
- Support for duplicate group names through
(?J)
(requires figuring out typed captures) - Support for branch reset alternations
(?|)
(parsing is implemented, but requires figuring out typed captures) - Parsing of conditionals
(?(x)...)
in accordance to what is in the syntax proposal (we currently parse the condition differently)- Including interpolation conditions
(?(?{...}))
- Conditional conditions don't capture on their own, only for child nodes e.g
(?((x))x)
. .NET also forbids named capture conditions, we should ban that. - Stop parsing named reference conditions for
(?(x)...)
- Don't allow
(?(DEFINE))
to have a false branch
- Including interpolation conditions
- Support for regex property values
\p{key=/regex/}
- Support for transform matching e.g
\p{toNFKC_Casefold=@toNFKC@}
- Support for alternative character property separators?
- UTS#18 suggests
key≠value
,key!=value
- Perl allows
key:value
- UTS#18 suggests
- Support
a**
syntax as explicitly eager quantification- I.e. it's not affected by API to change default quantification kind, (probably) not affected by
(?U)
- I.e. it's not affected by API to change default quantification kind, (probably) not affected by
Metadata
Metadata
Assignees
Labels
No labels