Skip to content

Correctly parse ado without any statements in it #46

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Mar 5, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -67,8 +67,9 @@ slowest parse times along with the mean parse time for the set.
npm run parse-package-set
```

You can also benchmark a single file:
You can also benchmark or parse a single file:

```sh
npm run bench-file MyModule.purs
npm run parse-file -- MyModule.purs --tokens
```
2 changes: 1 addition & 1 deletion bench/ParseFile.purs
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ import PureScript.CST.Types (SourceToken)

main :: Effect Unit
main = launchAff_ do
args <- Array.drop 2 <$> liftEffect Process.argv
args <- Array.drop 1 <$> liftEffect Process.argv
let printTokens = (elem "--tokens" || elem "-t") args
case Array.head args of
Just fileName -> do
Expand Down
1 change: 1 addition & 0 deletions package.json
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
"scripts": {
"parse-package-set": "spago -x parse-package-set/parse-package-set.dhall run",
"bench-file": "spago -x bench/bench.dhall build && node --expose-gc --input-type=\"module\" -e \"import { main } from './output/BenchFile/index.js';main()\"",
"parse-file": "spago -x bench/bench.dhall build && node --input-type=\"module\" -e \"import { main } from './output/ParseFile/index.js';main()\" --",
"format": "purs-tidy format-in-place src test bench parse-package-set",
"check": "purs-tidy check src test bench parse-package-set"
},
Expand Down
23 changes: 14 additions & 9 deletions src/PureScript/CST/Parser.purs
Original file line number Diff line number Diff line change
Expand Up @@ -95,14 +95,6 @@ layoutNonEmpty valueParser = ado
tail <- many (tokLayoutSep *> valueParser) <* tokLayoutEnd
in NonEmptyArray.cons' head tail

layout :: forall a. Parser a -> Parser (Array a)
layout valueParser =
tokLayoutStart *> values <* tokLayoutEnd
where
values = (go =<< valueParser) <|> pure []
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like it should be equivalent to optional? If we had an empty layout, then valueParser should immediately fail if given a TokLayoutEnd, shouldn't it? It's not obvious to me what "backtracking far enough" means then, and that there isn't some other internal bug. Do we know where backtracking is halting?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I was a little hasty in ascribing it to backtracking. Putting try in there does not help.

What's really messing it up is the indentation-based parser recovery when we call it in parseAdo (this is the only usage of layout). An an example change, this manages to parse the empty ado in:

--- a/src/PureScript/CST/Parser.purs
+++ b/src/PureScript/CST/Parser.purs
@@ -673,7 +673,7 @@ parseDo = do
 parseAdo :: Parser (Recovered Expr)
 parseAdo = do
   keyword <- tokQualifiedKeyword "ado"
-  statements <- layout (recoverDoStatement parseDoStatement)
+  statements <- layout parseDoStatement
   in_ <- tokKeyword "in"
   result <- parseExpr
   pure $ ExprAdo { keyword, statements, in: in_, result }

Copy link
Owner

@natefaubion natefaubion Feb 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes sense to me that there is something odd with recovery, but I'm not sure what. In the case of ado in, I would expect recoverIndent to immediately hit a TokLayoutEnd where col == indent, resulting in empty tokens, yielding Nothing, which would propagate the error.

let
Tuple tokens newStream = recoverTokensWhile
( \tok indent -> case tok.value of
TokLayoutEnd col -> col > indent
TokLayoutSep col -> col > indent
_ -> true
)
stream
if Array.null tokens then
Nothing

case k error state1.stream of
Nothing ->
runFn2 resume (state2 { consumed = state1.consumed }) error

Maybe something is wrong with the state threading in recover. The logic of the original code seems OK to me, which makes me think this is an internal bug and not related to the specific formulation of this parser.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I'll push a test for this case and dig deeper.

tail = many (tokLayoutSep *> valueParser)
go head = Array.cons head <$> tail

parseModule :: Parser (Recovered Module)
parseModule = do
header <- parseModuleHeader
Expand Down Expand Up @@ -673,7 +665,20 @@ parseDo = do
parseAdo :: Parser (Recovered Expr)
parseAdo = do
keyword <- tokQualifiedKeyword "ado"
statements <- layout (recoverDoStatement parseDoStatement)
-- A possibly-empty version of `layoutNonEmpty` to handle empty `ado in`
statements <- do
let
-- `recoverDoStatement` recovers too much if it is immediately
-- confronted with `TokLayoutEnd`, since that is associated with a
-- `layoutStack` _of the parent_ as opposed to the stuff we actually
-- want to recover, which we would correctly guess if we saw a statement
-- or two inside the block
valueParser = recoverDoStatement parseDoStatement
nonEmptyCase =
Array.cons <$> valueParser <*> many (tokLayoutSep *> valueParser)
_ <- tokLayoutStart
-- So we explicitly handle `TokLayoutEnd` ahead of time:
[] <$ tokLayoutEnd <|> nonEmptyCase <* tokLayoutEnd
in_ <- tokKeyword "in"
result <- parseExpr
pure $ ExprAdo { keyword, statements, in: in_, result }
Expand Down
71 changes: 71 additions & 0 deletions test/Main.purs
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,77 @@ main = do
_ ->
false

assertParse "Recovered ado statements"
"""
ado
foo <- bar
a b c +
foo
in 5
"""
case _ of
ParseSucceededWithErrors (ExprAdo { statements }) _
| [ DoBind _ _ _
, DoError _
, DoDiscard _
] <- statements ->
true
_ ->
false

assertParse "Recovered ado last statement"
"""
ado
foo <- bar
a b c +
in 5
"""
case _ of
ParseSucceededWithErrors (ExprAdo { statements }) _
| [ DoBind _ _ _
, DoError _
] <- statements ->
true
_ ->
false

assertParse "Recovered ado first statement"
"""
ado
a b c +
foo <- bar
in 5
"""
case _ of
ParseSucceededWithErrors (ExprAdo { statements }) _
| [ DoError _
, DoBind _ _ _
] <- statements ->
true
_ ->
false

assertParse "Empty ado in"
"""
ado in 1
"""
case _ of
(ParseSucceeded _ :: RecoveredParserResult Expr) ->
true
_ ->
false

assertParse "Empty ado \\n in"
"""
ado
in 1
"""
case _ of
(ParseSucceeded _ :: RecoveredParserResult Expr) ->
true
_ ->
false

assertParse "Recovered let bindings"
"""
let
Expand Down