feat: add node helpers to scanner #15

byCedric · 2020-12-24T16:45:52Z

fixes #7

This adds a really simple node builder to the scanner.

How it works

scanner.enter(<type>, <content>): <node>
Creates a node, using content as leading factor to determine if this is a literal or parent node. It adds the current position as start to the node.
scanner.exit(<node>): <node>
Adds the current position as end and returns the node for single line usage like return scanner.exit(node). It mutates the node, so you can exit the node and return it later.
scanner.abort(<node>, <expected-tokens>?): <Error>
This is a replacement for our invalidToken method from index.js. It creates an error to return or throw. It also rewinds the scanner position to the start of the node, to support optional tokens.

Other thoughts

I removed some of the expected tokens from the stateful methods. They were containing the actual node type that was expected, not the token characters. For replacement, I added a fallback to scanner.abort that uses <type> notation as expected/valid token.
I had another idea to create a dedicated node builder instance. The reason main reason I didn't go for this route is related to abstraction of the node content. If we do this, we have to provide an (unnecessary) abstraction for adding children or value. I really like the flexibility that we have right now, e.g. node = scanner.enter(); node.customProp = '('; return node.
We could split up the enter method into enterLiteral/enterParent, that makes it more explicit. But, looking at unist-builder, they provide a similar interface to what we have now. (with an exception for void nodes, but I don't think we will use them)
We can also move these methods to other helpers. But, because we were doing a lot of invalidToken(scanner, ... statements, I think it's better to have them in the same context.
Adding debug statements are also pretty easy, we could add debug(type)('enter') and debug(type)('exit') to these helpers to "visualize" the codepath of the parser. But for now, I don't think this is necessary.

byCedric · 2020-12-24T16:48:47Z

lib/scanner.js

+    return node
+  }
+
+  abort (node, expectedTokens) {


Another idea: if this is too much for every abort, we can also add fail or something like that. We can move the error to that method, and only use abort for rewinding to the starting position of the node.

Note that aborting a failed parsing attempt of a token, always rewinds the position back to node's start. Right now, that seems to work pretty good but I'm not sure how good that works with @bcoe's new recursive body parsing.

I'm not convinced that calling abort would cause any issues, let's hold off on adding an additional method until we know we need to differentiate between an abort and failure.

byCedric · 2020-12-24T16:50:14Z

lib/scanner.js

+  }
+
+  abort (node, expectedTokens) {
+    const position = `${this.pos.line}:${this.pos.column}`


We can also use unist-util-stringify-position instead of the ${pos.line}:${pos.column}, because we have the attempted node information?

I feel like there will be tooling value in having this as as well { line: <line>, column: <column> } on the error. Maybe before line 80 we could add error.position = this.pos?

Another helpful pattern I like to follow (used in node, which is good precedence I think) is to have distinct .code properties for these errors. If we were tooling around errors, checking err.code === 'EOF_ERROR is much better than err.message.startsWith('unexpected token EOF at').

I agree there would be value in adding additional context to the error object. I also wonder if we should consider returning an error rather than ever throwing? or perhaps have a "best effort" mode, and a mode that throws?

byCedric · 2020-12-24T16:50:59Z

test.js

      expect(() => {
        parser('feat add support for scopes')
-      }).to.throw("unexpected token ' ' at position 1:5 valid tokens [(, !, :]")
+      }).to.throw("unexpected token ' ' at 1:5, valid tokens [(, !, :]")


Position itself is a pretty distinct format, keeping this short and descriptive is important. But, feel free to roll this rewording back!

I'm supportive of this rewording.

wesleytodd

👍

wesleytodd · 2020-12-24T17:15:01Z

lib/scanner.js

+  }
+
+  abort (node, expectedTokens) {
+    const position = `${this.pos.line}:${this.pos.column}`


I feel like there will be tooling value in having this as as well { line: <line>, column: <column> } on the error. Maybe before line 80 we could add error.position = this.pos?

wesleytodd · 2020-12-24T17:17:13Z

lib/scanner.js

+  }
+
+  abort (node, expectedTokens) {
+    const position = `${this.pos.line}:${this.pos.column}`


Another helpful pattern I like to follow (used in node, which is good precedence I think) is to have distinct .code properties for these errors. If we were tooling around errors, checking err.code === 'EOF_ERROR is much better than err.message.startsWith('unexpected token EOF at').

bcoe

This is looking really solid.

bcoe · 2020-12-24T17:43:27Z

index.js

  node.children.push(bodyFooter(scanner))
-  node.position = { start, end: scanner.position() }
-  return node
+  return scanner.exit(node)


This is a much nicer API 😄

bcoe · 2020-12-24T17:45:32Z

lib/scanner.js

+    return node
+  }
+
+  abort (node, expectedTokens) {


I'm not convinced that calling abort would cause any issues, let's hold off on adding an additional method until we know we need to differentiate between an abort and failure.

bcoe · 2020-12-24T17:47:27Z

lib/scanner.js

+  }
+
+  abort (node, expectedTokens) {
+    const position = `${this.pos.line}:${this.pos.column}`


I agree there would be value in adding additional context to the error object. I also wonder if we should consider returning an error rather than ever throwing? or perhaps have a "best effort" mode, and a mode that throws?

bcoe · 2020-12-24T17:47:51Z

test.js

      expect(() => {
        parser('feat add support for scopes')
-      }).to.throw("unexpected token ' ' at position 1:5 valid tokens [(, !, :]")
+      }).to.throw("unexpected token ' ' at 1:5, valid tokens [(, !, :]")


I'm supportive of this rewording.

byCedric requested review from bcoe and wesleytodd December 24, 2020 16:45

byCedric commented Dec 24, 2020

View reviewed changes

byCedric requested a review from damianopetrungaro December 24, 2020 16:55

wesleytodd approved these changes Dec 24, 2020

View reviewed changes

bcoe approved these changes Dec 24, 2020

View reviewed changes

byCedric added 3 commits December 24, 2020 19:12

refactor: remove ambiguous whitespace consumer from scanner

8c6f234

feat: add node enter, exit, and abort methods

049689b

refactor: fallback to expected node type when aborting

ad0ae52

byCedric force-pushed the @bycedric/refactor/scanner branch from f4aff15 to ad0ae52 Compare December 24, 2020 18:14

chore: update snapshots to match fixed positioning

6ef9963

byCedric mentioned this pull request Dec 24, 2020

Add programmatic context to parsing errors #17

Open

byCedric merged commit ef8a6ca into main Dec 24, 2020

byCedric deleted the @bycedric/refactor/scanner branch December 24, 2020 19:38

feat: add node helpers to scanner #15

feat: add node helpers to scanner #15

Uh oh!

Conversation

byCedric commented Dec 24, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

How it works

Other thoughts

Uh oh!

byCedric Dec 24, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wesleytodd left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bcoe left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

byCedric commented Dec 24, 2020 •

edited

Loading

byCedric Dec 24, 2020 •

edited

Loading