-
Notifications
You must be signed in to change notification settings - Fork 89
/
Copy pathparser-lexer-communication.g
138 lines (111 loc) · 4.09 KB
/
parser-lexer-communication.g
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
/**
* Change lexer state from parser.
*
* Note: a tokenizer can be accessed in a semantic action as `yy.lexer`,
* or `yy.tokenizer`.
*
* The grammar below solves the problem of parsing { } in statement position as
* a "BlockStatement", and in the expression position as an "ObjectLiteral".
*
* Note: there are several other techniques for solving this: lookahead
* restriction productions, or cover grammar.
*
* Example in the statement position:
*
* ./bin/syntax -g examples/parser-lexer-communication.g -m lalr1 -p '{ 1; 2; }'
*
* ✓ Accepted
*
* Parsed value:
*
* {
* "type": "Program",
* "body": [
* {
* "type": "BlockStatement",
* "body": [
* "1",
* "2"
* ]
* }
* ]
* }
*
* Two empty blocks:
*
* ./bin/syntax -g examples/parser-lexer-communication.g -m lalr1 -p '{{}}'
*
* Exaple in the expression position:
*
* ./bin/syntax -g examples/parser-lexer-communication.g -m lalr1 -p '({ 1, 2 });'
*
* ✓ Accepted
*
* Parsed value:
*
* {
* "type": "Program",
* "body": [
* {
* "type": "ObjectLiteral",
* "properties": [
* "1",
* "2"
* ]
* }
* ]
* }
*/
{
// --------------------------------------------------
// Lexical grammar.
lex: {
// Lexer states.
startConditions: {
expression: 0,
},
rules: [
[`\\s+`, `/* skip whitespace */`],
// { and } in the expression position yield different token types:
[['expression'], `\\{`, `return '%{'`],
[['expression'], `\\}`, `return '}%'`],
// { and } in the statement position yield default token types:
[`\\{`, `return '{'`],
[`\\}`, `return '}'`],
[`\\d+`, `return 'NUMBER'`],
[`;`, `return ';'`],
[`,`, `return ','`],
[`\\(`, `return '('`],
[`\\)`, `return ')'`],
],
},
// --------------------------------------------------
// Syntactic grammar.
bnf: {
Program: [[`StatmentList`, `$$ = {type: 'Program', body: $1}`]],
StatmentList: [[`Statment`, `$$ = [$1]`],
[`StatmentList Statment`, `$$ = $1; $1.push($2)`]],
Statment: [[`BlockStatement`, `$$ = $1`],
[`ExpressionStatement`, `$$ = $1`]],
BlockStatement: [[`{ OptStatmentList }`, `$$ = {type: 'BlockStatement', body: $2}`]],
OptStatmentList: [[`StatmentList`, `$$ = $1`],
[`ε`, `$$ = null`]],
ExpressionStatement: [[`Expression ;`, `$$ = $1`]],
Expression: [[`expressionBegin ExpressionNode expressionEnd`,
`$$ = $2`]],
// Special "activation productions". They activate needed lexer state,
// so the later can yield different token types for the same chars.
expressionBegin: [[`ε`, `yy.lexer.pushState('expression')`]],
expressionEnd: [[`ε`, `yy.lexer.popState()`]],
ExpressionNode: [[`NumericLiteral`, `$$ = $1`],
[`ObjectLiteral`, `$$ = $1`],
[`( Expression )`, `$$ = $2`]],
NumericLiteral: [[`NUMBER`, `$$ = $1`]],
ObjectLiteral: [[`%{ OptPropertyList }%`, `$$ = {type: 'ObjectLiteral', properties: $2}`]],
OptPropertyList: [[`PropertyList`, `$$ = $1`],
[`ε`, `$$ = null`]],
PropertyList: [[`Property`, `$$ = [$1]`],
[`PropertyList , Property`, `$$ = $1; $1.push($3)`]],
Property: [`NumericLiteral`, `$$ = $1`],
}
}