-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathdoc.go
207 lines (128 loc) · 5.75 KB
/
doc.go
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
/*
Package lexer implements the base components of a lexical analyzer, enabling the
creation of hand-written lexers for tokenizing textual content.
Some Features of this Lexer:
* Rune-Centric
* Infinite Lookahead
* Mark / Reset Functionality
* Line / Column Tracking
Initiating a Lexer
Lexing is initiated through various Lex* methods, each accepting a different type of input to lex from:
// Input Type: string
//
func LexString(input string, start lexer.Fn) token.Nexter
// Input Type: io.RuneReader
//
func LexRuneReader(input io.RuneReader, start lexer.Fn) token.Nexter
// Input Type: io.Reader
//
func LexReader(input io.Reader, start lexer.Fn) token.Nexter
// Input Type: []rune
//
func LexRunes(input []rune, start lexer.Fn) token.Nexter
// Input Type: []byte
//
func LexBytes(input []byte, start lexer.Fn) token.Nexter
Lexer Functions
In addition to the input data, each Lex function also accepts a function which serves as the starting point for your
lexer:
// lexer.Fn are user functions that scan runes and emit tokens.
//
type lexer.Fn func(*Lexer) lexer.Fn
The main Lexer process will call into this function to initiate lexing.
Simplified Lexer.Fn Loop
You'll notice that the `Lexer.Fn` return type is another `Lexer.Fn`.
This is to allow for simplified flow control of your lexer function.
Your lexer function only needs to concern itself with matching the very next rune(s) of input.
This alleviates the need to manage complex looping / restart logic.
Simply return from your method after (possibly) emitting a token, and the Lexer will manage the looping.
Switching Lexer Context
Switching contexts is as easy as returning a reference to another Lexer.Fn.
For example, if, within your main lexer function, you encounter a `"`, you can simply return a reference to your
`quotedStringLexer` function and the Lexer will transfer control to it.
Once finished, your quoted string lexer can return control back to your main lexer by returning a reference to your
`mainLexer` function.
Shutting Down The Lexer
You can shut down the main Lexer loop from within your `Lexer.Fn` by simply returning `nil`.
All previously emitted tokens will still be available for pickup, but the lexer will stop making any further `Lexer.Fn`
calls.
Scanning Runes
Your Lexer function receives a `*Lexer` when called and can use the following methods to inspect and match runes:
// CanPeek confirms if the requested number of runes are available in the peek buffer.
//
func (l *Lexer) CanPeek(n int) bool
// Peek allows you to look ahead at runes without consuming them.
//
func (l *Lexer) Peek(n int) rune
// Next matches and returns the next rune in the input.
//
func (l *Lexer) Next() rune
// PeekToken allows you to inspect the currently matched rune sequence.
//
func (l *Lexer) PeekToken() string
Emitting Tokens
Once you've determined what the matched rune(s) represent, you can emit a token for further processing
(for example, by a parser):
// EmitToken emits a token of the specified type, along with all of the matched runes.
//
func (l *Lexer) EmitToken(t token.Type)
// EmitType emits a token of the specified type, discarding all previously-matched runes.
//
func (l *Lexer) EmitType(t token.Type)
NOTE: See the section of the document regarding "Token Types" for details on defining tokens for your lexer.
Discarding Matched Runes
Sometimes, you may match a series of runes that you simply wish to discard:
// Clear discards all previously-matched runes without emitting any tokens.
//
func (l *Lexer) Clear()
Creating Save Points
The Lexer allows you to create save points and reset to them if you decide you want to re-try matching runes in a
different context:
// Marker returns a marker that you can use to reset the lexer to a previous state.
//
func (l *Lexer) Marker() *Marker
A marker is good up until the next `Emit()` or `Clear()` action.
Before using a marker, confirm it is still valid:
// Valid confirms if the marker is still valid.
//
func (m *Marker) Valid() bool
Once you've confirmed a marker is still valid:
// Apply resets the lexer state to the marker position.
// Returns the Lexer.Fn that was stored at the time the marker was created.
//
func (m *Marker) Apply() lexer.Fn
NOTE: Resetting a marker does not reset the lexer function that was active when the marker was created.
Instead it simply returns the function reference. If you want to return control to the function saved in the marker,
you can use this pattern:
return marker.Apply(); // Resets the lexer and returns control to the saved Lexer.Fn
Token Types
Lexer defines a few pre-defined token values:
const (
TLexErr token.Type = iota // Lexer error
TUnknown // Unknown rune(s)
TEof // EOF
TStart // Marker for user tokens ( use TStart + iota )
)
You define your own token types starting from TStart:
const (
TInt = lexer.TStart + iota
TChar
)
Retrieving Emitted Tokens
When called, the `Lex*` functions will return a `token.Nexter` which provides a means of retrieving tokens (and errors)
emitted from the lexer:
type Nexter interface {
// Next tries to fetch the next available token, returning an error if something goes wrong.
// Will return io.EOF to indicate end-of-file.
//
Next() (token.Token, error)
}
Tracking Lines and Columns
Lexer tracks lines and columns as runes are consumed, and exposes them in the emitted Tokens.
Lexer uses '\n' as the newline separator when tracking line counts.
NOTE: Error messages with line/column information may reference the start of an attempted token match and not the
position of the rune(s) that generated the error.
Example Programs
See the `examples` folder for programs that demonstrate the lexer functionality.
*/
package lexer