Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dash in range not correctly parsed #1

Closed
neongreen opened this issue Oct 19, 2019 · 2 comments · Fixed by #41 or #45
Closed

dash in range not correctly parsed #1

neongreen opened this issue Oct 19, 2019 · 2 comments · Fixed by #41 or #45
Assignees
Labels
bug Something isn't working
Milestone

Comments

@neongreen
Copy link
Collaborator

neongreen commented Oct 19, 2019

ChrisKuklewicz/regex-tdfa#24, originally reported by @pjljvandelaar


As specified in
https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html

the expression "[--@]" matches any of the characters between '-' and '@' inclusive

However, ("@" =~ "[--@]") results in

Explict error in module Text.Regex.TDFA.String : 
Text.Regex.TDFA.String died: parseRegex for Text.Regex.TDFA.String failed:"[--@]" (line 1, column 4):
       unexpected A dash is in the wrong place in a bracket
       expecting "]"
       CallStack (from HasCallStack):
         error, called at .\Text\Regex\TDFA\Common.hs:29:3 in regex-tdfa-1.2.3.1-DVMXTrvIFHgDCky8s203W0:Text.Regex.TDFA.Common)
@andreasabel
Copy link
Member

andreasabel commented Jul 15, 2022

The error is also triggered for an empty range, e.g. [1-0]:

ghci> ("1" =~ "[1-0]") :: String
"*** Exception: Explict error in module Text.Regex.TDFA.String : Text.Regex.TDFA.String died: parseRegex for Text.Regex.TDFA.String failed:"[1-0]" (line 1, column 4):
unexpected A dash is in the wrong place in a bracket
expecting "]"
CallStack (from HasCallStack):
  error, called at lib/Text/Regex/TDFA/Common.hs:31:3 in regex-tdfa-1.3.1.3-inplace:Text.Regex.TDFA.Common

I think the intention was to give another error in this case:

p_set_elem_range = try $ do
start <- noneOf "]-"
_ <- char '-'
end <- noneOf "]"
-- bug fix: check start <= end before "return (BEChars [start..end])"
if start <= end
then return (BEChars [start..end])
else unexpected "End point of dashed character range is less than starting point"

However, this error is caught by <|>:
p_set_elem = p_set_elem_class <|> p_set_elem_equiv <|> p_set_elem_coll
<|> p_set_elem_range <|> p_set_elem_char <?> "Failed to parse bracketed string"

So we end up with the error triggered by the last alternative:
p_set_elem_char = do
c <- noneOf "]"
when (c == '-') $ do
atEnd <- (lookAhead (char ']') >> return True) <|> (return False)
when (not atEnd) (unexpected "A dash is in the wrong place in a bracket")
return (BEChar c)

@andreasabel andreasabel self-assigned this Jul 16, 2022
@andreasabel andreasabel added this to the 1.3.1.4 milestone Jul 16, 2022
@andreasabel
Copy link
Member

The solution to the OP is to simply not further restrict the appearance of - in a bracketed expression. E.g. [--] would be valid, meaning either - or -. So, we delete this check:

when (c == '-') $ do
atEnd <- (lookAhead (char ']') >> return True) <|> (return False)
when (not atEnd) (unexpected "A dash is in the wrong place in a bracket")

andreasabel added a commit that referenced this issue Jul 16, 2022
There are no illegal occurrences of `-` in a bracket expression.
Those that are separators of a range (e.g. `a-z`) are simply standing
for themselves.  A dash can even be an start or end point of a range,
as in `[--@]`.
andreasabel added a commit that referenced this issue Jul 16, 2022
andreasabel added a commit that referenced this issue Jul 16, 2022
andreasabel added a commit that referenced this issue Jul 18, 2022
Ranges beginning with `-` were not recognized.
andreasabel added a commit that referenced this issue Jul 18, 2022
Ranges beginning with `-` were not recognized.

Write GHC environment files so parallel-doctest picks up correct package.
@andreasabel andreasabel modified the milestones: 1.3.1.4, 1.3.1.5 Jul 18, 2022
netbsd-srcmastr pushed a commit to NetBSD/pkgsrc that referenced this issue Aug 18, 2022
### 1.3.2

_2022-07-18, Andreas Abel_

- Export `decodePatternSet` and `decodeCharacterClass` from `Text.Regex.TDFA.Pattern`
  ([#16](haskell-hvr/regex-tdfa#16))
- Extend and correct docs for `Pattern` module
- Tested with GHC 7.4 - 9.4

### 1.3.1.5

_2022-07-18, Andreas Abel_

- Allow dash (`-`) as start of a range, e.g. `[--z]`
  ([#1](haskell-hvr/regex-tdfa#1),
  [#45](haskell-hvr/regex-tdfa#45))
- Tested with GHC 7.4 - 9.4

### 1.3.1.4

_2022-07-17, Andreas Abel_

- Fix parsing of dashes in bracket expressions, e.g. `[-a-z]` ([#1](haskell-hvr/regex-tdfa#1))
- Fix a deprecation warning except for on GHC 8.2 ([#21](haskell-hvr/regex-tdfa#21))
- Documentation: link `defaultComptOpt` to its definition  ([#13](haskell-hvr/regex-tdfa#13))
- Verify documentation examples with new `doc-test` testsuite
- Tested with GHC 7.4 - 9.4

### 1.3.1.3

_2022-07-14, Andreas Abel_

- Fix an `undefined` in `Show PatternSet` ([#37](haskell-hvr/regex-tdfa#37))
- Document POSIX character classes (e.g. `[[:digit:]]`) in README
- Tested with GHC 7.4 - 9.4

### 1.3.1.2 Revision 1

_2022-05-25, Andreas Abel_

- Allow `base >= 4.17` (GHC 9.4)

### 1.3.1.2

_2022-02-19, Andreas Abel_
- No longer rely on the `MonadFail` instance for `ST`
  (future `base` library change, see [#29](haskell-hvr/regex-tdfa#29)).
- Silence warning `incomplete-uni-patterns` (GHC >= 9.2).
- Import from `Data.List` explicitly or qualified (warning `compat-unqualified-imports`).
- Import from `Control.Monad` to allow `mtl-2.3` in its `rc3` incarnation.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
2 participants