-
Notifications
You must be signed in to change notification settings - Fork 20
Add a regex combinator #28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
import Data.String (Pattern(..), charAt, length, indexOf', singleton) | ||
import Data.Maybe (Maybe(..), fromMaybe) | ||
import Data.String (Pattern(..), charAt, drop, length, indexOf', singleton) | ||
import Data.String.Utils (startsWith) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could use Data.String.stripPrefix
here, and avoid the extra dependency (which would necessitate a major version bump).
pat | ||
else | ||
"^" <> pat | ||
er = Regex.regex pattern noFlags |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we inline this into the case
expression please? I think it's a little easier to read that way.
if startsWith matched remainder then | ||
Right { result: matched, suffix: { str, pos: pos + length matched } } | ||
else | ||
Left { pos, error: ParseError $ "no match - consider prefacing the pattern with '^'" } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe prefix the error with Text.Parsing.StringParser.String.regex:
so it's clear where this error comes from? Same with the one in regex'
above.
Thanks for this! I have a few comments above, but it's not far off. |
Replace startsWith with stripPrefix. Remove Bower dependency. Inline the regex into the case expression. Prefix error messages with the function name for errors which indicate pattern problems All very good points. Many thanks for the review.
case uncons $ fromMaybe [] $ Regex.match r remainder of | ||
Just { head: Just matched, tail: _ } -> | ||
-- only accept matches at position 0 | ||
case stripPrefix (Pattern matched) remainder of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just because the string matches at the start doesn't mean it was the same match, right? Consider something like a+(?!b)
applied to the string aaabaaa
. There is a match, aaa
, and the string starts with that string, but the parser should actually fail.
Maybe we should only export the regex'
version you originally had then, unless we can find a way to get position information for each match.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm - you're right - I hadn't thought of that either. I can't think of a good way of getting position information for each match. I suppose, if we want to retain precompilation, there is the alternative of using a wrapper for Regex, hiding the default constructor and then providing a constructor that prefixes the '^' character. So we'd have something like:
data ParserRegex = ParserRegex Regex.Regex
-- | Construct a ParserRegex ensuring patterns are always prefaced with '^'.
parserRegex :: String -> Either String ParserRegex
-- | Build the regular expression from the pattern and match it
-- | (fail on patterns in error).
regex' :: String -> Parser String
-- | Match the regular expression.
regex :: ParserRegex -> Parser String
This doesn't introduce any further dependencies, but it is a little cumbersome. Otherwise, I'd be very happy to revert to my original version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well we can always precompile the regex when regex
is called.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK - I think I see what you're saying. A future version of the String module could memoise the regex - is that right? (I had been assuming it was the responsibility of the client). Anyway, are you OK for me to swap the names of regex and regex', only expose what is now regex and just leave it at that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking we'd just go back to one function actually:
regex :: String -> Parser String
That function would first compile a Regex
, so the definition of the Parser
itself would be closed over it. If the user wanted to cache the compiled regex, they could simply hoist the definition of the Parser
to the top level.
remainder = drop pos str | ||
in | ||
case uncons $ fromMaybe [] $ Regex.match r remainder of | ||
Just { head: Just matched, tail: _ } -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since all matches start at the beginning now, should we take the longest, not the first?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought these other matches were just the the parenthesized substring matches which we weren't interested in - https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/exec. i.e for patterns like (a|A)(bIB) and so on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you're right, thanks.
Just { head: Just matched, tail: _ } -> | ||
Right { result: matched, suffix: { str, pos: pos + length matched } } | ||
_ -> | ||
Left { pos, error: ParseError $ "no match" } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
$
is redundant here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh - of course. Hangover from an older message
Thanks, and sorry for all the back and forth! |
Don't worry - thanks for being so patient and for the Socratic dialogue - I've learnt a good deal. |
Add regex and regex' as discussed in issue #25.