Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[std/re] fix findBounds and find procs #18028

Merged
merged 3 commits into from
May 31, 2021
Merged

[std/re] fix findBounds and find procs #18028

merged 3 commits into from
May 31, 2021

Conversation

ringabout
Copy link
Member

@ringabout ringabout commented May 16, 2021

The pre-existing interfaces have two invariants. Some procs give bufSize a default value, which is error-prone.

silently doing the wrong thing (treating input as length 0), which is surprising

Before

proc findBounds*(buf: cstring, pattern: Regex, matches: var openArray[string],
                 start = 0, bufSize: int): tuple[first, last: int]

proc findBounds*(s: string, pattern: Regex, matches: var openArray[string],
                 start = 0): tuple[first, last: int] {.inline.}

proc findBounds*(buf: cstring, pattern: Regex,
                 matches: var openArray[tuple[first, last: int]],
                 start = 0, bufSize = 0): tuple[first, last: int] 

proc findBounds*(s: string, pattern: Regex,
                 matches: var openArray[tuple[first, last: int]],
                 start = 0): tuple[first, last: int] {.inline.}

proc findBounds*(buf: cstring, pattern: Regex,
                 start = 0, bufSize: int): tuple[first, last: int]

proc findBounds*(s: string, pattern: Regex,
                 start = 0): tuple[first, last: int] {.inline.}

After

proc findBounds*(buf: cstring, pattern: Regex, matches: var openArray[string],
                 start = 0, bufSize: int): tuple[first, last: int]

proc findBounds*(s: string, pattern: Regex, matches: var openArray[string],
                 start = 0): tuple[first, last: int] {.inline.}

proc findBounds*(buf: cstring, pattern: Regex,
                 matches: var openArray[tuple[first, last: int]],
                 start = 0, bufSize: int): tuple[first, last: int]

proc findBounds*(s: string, pattern: Regex,
                 matches: var openArray[tuple[first, last: int]],
                 start = 0): tuple[first, last: int] {.inline.} 

proc findBounds*(buf: cstring, pattern: Regex,
                 start = 0, bufSize: int): tuple[first, last: int] 
proc findBounds*(s: string, pattern: Regex,
                 start = 0): tuple[first, last: int] {.inline.}

It might be a breaking changes, while I don't think it is a big deal, because in reality buf should not be zero length. With the former interface, it gives wrong or surprise result. So it is worthy fixing it.

See the former implementation, if it is called like find("......", pat, matches), it will cause dangerous consequence.

proc find*(buf: cstring, pattern: Regex, matches: var openArray[string],
           start = 0, bufSize = 0): int =
  ## returns the starting position of `pattern` in `buf` and the captured
  ## substrings in the array `matches`. If it does not match, nothing
  ## is written into `matches` and `-1` is returned.
  ## `buf` has length `bufSize` (not necessarily `'\0'` terminated).
  var
    rtarray = initRtArray[cint]((matches.len+1)*3)
    rawMatches = rtarray.getRawData
    res = pcre.exec(pattern.h, pattern.e, buf, bufSize.cint, start.cint, 0'i32,
      cast[ptr cint](rawMatches), (matches.len+1).cint*3)
  if res < 0'i32: return res
  for i in 1..int(res)-1:
    var a = rawMatches[i * 2]
    var b = rawMatches[i * 2 + 1]
    if a >= 0'i32: matches[i-1] = bufSubstr(buf, int(a), int(b))
    else: matches[i-1] = ""
  return rawMatches[0]

@ringabout ringabout changed the title [std/re] make interfaces consistent [std/re] fix findBounds and find procs May 16, 2021
@timotheecour
Copy link
Member

timotheecour commented May 16, 2021

6 overloads that basically do the same things sounds like a code smell; not only there are 6 overloads but even then it's still not flexible enough. Instead, can't we simply use openArray[char] to subsume both cstring and string and get rid of the need for bufSize?

eg:

proc findBounds*(input: openArray[char], pattern: Regex, matches: var openArray[Slice[int]],
  start = 0): tuple[Slice[int]]

which subsumes the other cases and is also more flexible (allows user to avoid allocations).

(refs: nim-lang/RFCs#381)

@Araq Araq merged commit c0e8199 into nim-lang:devel May 31, 2021
@Araq
Copy link
Member

Araq commented May 31, 2021

which subsumes the other cases and is also more flexible

Yes but that's a more severe change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants