Skip to content

Add a regex combinator #28

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Mar 1, 2017
Merged

Conversation

newlandsvalley
Copy link
Contributor

Add regex and regex' as discussed in issue #25.

import Data.String (Pattern(..), charAt, length, indexOf', singleton)
import Data.Maybe (Maybe(..), fromMaybe)
import Data.String (Pattern(..), charAt, drop, length, indexOf', singleton)
import Data.String.Utils (startsWith)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could use Data.String.stripPrefix here, and avoid the extra dependency (which would necessitate a major version bump).

pat
else
"^" <> pat
er = Regex.regex pattern noFlags
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we inline this into the case expression please? I think it's a little easier to read that way.

if startsWith matched remainder then
Right { result: matched, suffix: { str, pos: pos + length matched } }
else
Left { pos, error: ParseError $ "no match - consider prefacing the pattern with '^'" }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe prefix the error with Text.Parsing.StringParser.String.regex: so it's clear where this error comes from? Same with the one in regex' above.

@paf31
Copy link
Contributor

paf31 commented Feb 25, 2017

Thanks for this! I have a few comments above, but it's not far off.

Replace startsWith with stripPrefix. Remove Bower dependency.
Inline the regex into the case expression.
Prefix error messages with the function name for errors which indicate pattern problems

All very good points.  Many thanks for the review.
case uncons $ fromMaybe [] $ Regex.match r remainder of
Just { head: Just matched, tail: _ } ->
-- only accept matches at position 0
case stripPrefix (Pattern matched) remainder of
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just because the string matches at the start doesn't mean it was the same match, right? Consider something like a+(?!b) applied to the string aaabaaa. There is a match, aaa, and the string starts with that string, but the parser should actually fail.

Maybe we should only export the regex' version you originally had then, unless we can find a way to get position information for each match.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm - you're right - I hadn't thought of that either. I can't think of a good way of getting position information for each match. I suppose, if we want to retain precompilation, there is the alternative of using a wrapper for Regex, hiding the default constructor and then providing a constructor that prefixes the '^' character. So we'd have something like:

data ParserRegex = ParserRegex Regex.Regex

-- | Construct a ParserRegex ensuring patterns are always prefaced with '^'.
parserRegex :: String -> Either String ParserRegex

-- | Build the regular expression from the pattern and match it
-- | (fail on patterns in error).
regex' :: String -> Parser String

-- | Match the regular expression.
regex :: ParserRegex -> Parser String

This doesn't introduce any further dependencies, but it is a little cumbersome. Otherwise, I'd be very happy to revert to my original version.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well we can always precompile the regex when regex is called.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK - I think I see what you're saying. A future version of the String module could memoise the regex - is that right? (I had been assuming it was the responsibility of the client). Anyway, are you OK for me to swap the names of regex and regex', only expose what is now regex and just leave it at that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking we'd just go back to one function actually:

regex :: String -> Parser String

That function would first compile a Regex, so the definition of the Parser itself would be closed over it. If the user wanted to cache the compiled regex, they could simply hoist the definition of the Parser to the top level.

remainder = drop pos str
in
case uncons $ fromMaybe [] $ Regex.match r remainder of
Just { head: Just matched, tail: _ } ->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since all matches start at the beginning now, should we take the longest, not the first?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought these other matches were just the the parenthesized substring matches which we weren't interested in - https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/exec. i.e for patterns like (a|A)(bIB) and so on.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you're right, thanks.

Just { head: Just matched, tail: _ } ->
Right { result: matched, suffix: { str, pos: pos + length matched } }
_ ->
Left { pos, error: ParseError $ "no match" }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$ is redundant here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh - of course. Hangover from an older message

@paf31 paf31 merged commit e5699a9 into purescript-contrib:master Mar 1, 2017
@paf31
Copy link
Contributor

paf31 commented Mar 1, 2017

Thanks, and sorry for all the back and forth!

@newlandsvalley
Copy link
Contributor Author

Don't worry - thanks for being so patient and for the Socratic dialogue - I've learnt a good deal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants