Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New API #111

Closed
nitely opened this issue Jan 15, 2022 · 3 comments
Closed

New API #111

nitely opened this issue Jan 15, 2022 · 3 comments
Labels

Comments

@nitely
Copy link
Owner

nitely commented Jan 15, 2022

API spec:

func re2(s: string): Regex2
func re2(s: static string): static[Regex2]
func group(m: RegexMatch2; i: int): Slice[int]
func group(m: RegexMatch2; s: string): Slice[int]
func groupCount(m: RegexMatch2): int
func groupNames(m: RegexMatch2): seq[string]
func match(s: string; pattern: Regex2): bool
func match(s: string; pattern: Regex2; m: var RegexMatch2; start = 0): bool
[func,iterator] findAll(s: string; pattern: Regex; start = 0): seq[RegexMatch2]
func find(s: string; pattern: Regex2; m: var RegexMatch2; start = 0): bool
[func,iterator] capture(s: string; pattern: Regex): seq[string]
func contains(s: string; pattern: Regex2): bool
[func,iterator] split(s: string; sep: Regex2): seq[string]
[func,iterator] splitIncl(s: string; sep: Regex2): seq[string]
func startsWith(s: string; pattern: Regex2; start = 0): bool
func endsWith(s: string; pattern: Regex2): bool
func replace(s: string; pattern: Regex2; by: string; limit = 0): string
func replace(s: string; pattern: Regex2; by: proc (m: RegexMatch2; s: string): string; limit = 0): string 
func isInitialized(re: Regex2): bool
func escapeRe(s: string): string
macro match(text: string; regex: RegexLit; body: untyped): untyped

The Captures all group repetitions (not just the last one) feature is removed, we capture the last repetition. This is a breaking change, and it will break some of the APIs. The rest of APIs are deprecated or removed.

@nitely
Copy link
Owner Author

nitely commented Aug 4, 2023

Changes to support both the old APIs and new APIs for a while:

  • Regex -> Regex2
  • RegexMatch -> RegexMatch2
  • re -> re2

@nitely nitely added the design label Aug 4, 2023
@nitely
Copy link
Owner Author

nitely commented Aug 13, 2023

#122 is merged

@nitely nitely closed this as completed Aug 13, 2023
@nitely
Copy link
Owner Author

nitely commented Aug 13, 2023

I think I've not given the rational to remove the Captures all group repetitions (not just the last one) feature anywhere, so I'll do it here.

In order to capture all of the repetitions in re"(\w)+" a full parse tree of submatch (capture group) boundaries needs to be generated. The tree is usually small except when it's not. The main issue is the space complexity is O(N*M) where N is the text length, and M is the regex length. While this is not unbounded, it may be prohibitive, more so when matching untrusted text. Keeping only the last repetition submatch makes space complexity O(N*M) where N is the regex length and M the number of submatches (both usually known at compile time).

Why not provide both options?
It's a lot of additional complexity.

What if I need all captures?
You can do as in the rest of languages, match and then findAll.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant