-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathlisp-interpreter-episode-3-the-finally.html
304 lines (267 loc) · 18.5 KB
/
lisp-interpreter-episode-3-the-finally.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
<!DOCTYPE html>
<head>
<meta charset="utf-8" />
<!-- Set the viewport width to device width for mobile -->
<meta name="viewport" content="width=device-width" />
<title>Lisp Interpreter: Episode 3 (The Finally!)</title>
<link rel="stylesheet" href="http://lmontopo.github.io/theme/css/normalize.css" />
<link rel="stylesheet" href="http://lmontopo.github.io/theme/css/foundation.min.css" />
<link rel="stylesheet" href="http://lmontopo.github.io/theme/css/style.css" />
<link rel="stylesheet" href="http://lmontopo.github.io/theme/css/pygments.css" />
<script src="http://lmontopo.github.io/theme/js/modernizr.js"></script>
</head>
<body>
<!-- Nav Bar -->
<nav>
<div class="top-bar">
<div class="row">
<div class="large-9 large-centered columns">
<h1><a href="http://lmontopo.github.io">The L Blog</a></h1>
</div>
</div>
</div>
<!-- Show menu items and pages -->
<div class="row">
<div class="large-9 columns">
<ul class="button-group navigation">
</ul>
</div>
</div>
</nav>
<!-- End Nav -->
<!-- Main Page Content and Sidebar -->
<div class="row">
<!-- Main Blog Content -->
<div class="large-9 columns">
<article>
<header>
<h3 class="article-title"><a href="http://lmontopo.github.io/lisp-interpreter-episode-3-the-finally.html" rel="bookmark"
title="Permalink to Lisp Interpreter: Episode 3 (The Finally!)">Lisp Interpreter: Episode 3 (The Finally!)</a></h3>
</header>
<h6 class="subheader" title="2014-11-18T05:00:00-05:00">Tue 18 November 2014
</h6>
<p>Let me start off by apologizing for the delay. I've been hesitant to
write this post for several reasons, including:</p>
<ol>
<li>Presenting and explaining my entire lisp interpreter is a BIG (and
therefore daunting) task.</li>
<li>There are parts of my code that I'm not entirely satisfied with and
I <em>may</em> refactor these parts in the future. For this reason, I've
been debating waiting until I finish refacturing to write this post.</li>
<li>The parts of my code that I'm unsatisfied with are pretty ugly and
are a pain to read. Which leads me to the next point...</li>
<li>Will anyone actually read this?</li>
</ol>
<p>I have decided to go ahead and write this blog post anyway. Even if no
one reads it, I think that I'll benefit from writing it. As far as the
ugly bits go, I'm trying to not let 'perfect' be the enemy of 'good
enough'. This project is far from perfect but that OK. I worked hard on
it, it works, I learned lots during the process, and those are the
things that matter most. Besides, I've gotten really excited this week
about some other projects, so it could be a while before I get back to
refacturing the interpreter.</p>
<p>OK. Lets get started. Here are the first ten lines of code:</p>
<table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre>1
2
3
4
5
6
7
8
9</pre></div></td><td class="code"><div class="highlight"><pre><span class="kn">from</span> <span class="nn">__future__</span> <span class="kn">import</span> <span class="n">division</span>
<span class="kn">from</span> <span class="nn">termcolor</span> <span class="kn">import</span> <span class="n">colored</span>
<span class="kn">from</span> <span class="nn">sys</span> <span class="kn">import</span> <span class="nb">exit</span>
<span class="c"># ------ GlOBAL LISTS ------</span>
<span class="n">symbol</span> <span class="o">=</span> <span class="p">[</span><span class="s">'+'</span><span class="p">,</span> <span class="s">'-'</span><span class="p">,</span> <span class="s">'*'</span><span class="p">,</span> <span class="s">'/'</span><span class="p">,</span> <span class="s">'<'</span><span class="p">,</span> <span class="s">'>'</span><span class="p">,</span><span class="s">'<='</span><span class="p">,</span> <span class="s">'='</span><span class="p">,</span><span class="s">'>='</span><span class="p">,</span> <span class="s">'abs'</span><span class="p">,</span> <span class="s">'list'</span><span class="p">,</span> <span class="s">'if'</span><span class="p">,</span> <span class="s">'not'</span><span class="p">,</span>
<span class="s">'set!'</span><span class="p">,</span> <span class="s">'begin'</span><span class="p">,</span> <span class="s">'let'</span><span class="p">,</span> <span class="s">'define'</span><span class="p">,</span> <span class="s">'lambda'</span><span class="p">,</span> <span class="s">'quote'</span><span class="p">]</span>
<span class="n">special</span> <span class="o">=</span> <span class="p">[</span><span class="s">'set!'</span><span class="p">,</span> <span class="s">'begin'</span><span class="p">,</span> <span class="s">'let'</span><span class="p">,</span> <span class="s">'define'</span><span class="p">,</span> <span class="s">'lambda'</span><span class="p">,</span> <span class="s">'cond'</span><span class="p">,</span> <span class="s">'quote'</span><span class="p">,</span> <span class="s">'if'</span><span class="p">,</span>
<span class="s">'list'</span><span class="p">,</span> <span class="s">'map'</span><span class="p">,</span> <span class="s">'cons'</span><span class="p">,</span> <span class="s">'car'</span><span class="p">,</span> <span class="s">'cdr'</span><span class="p">]</span>
</pre></div>
</td></tr></table>
<p>The first 3 lines just specify any functions I'll be using from built in
libraries. Next, two important global lists are introduced. The list,
'symbol', specifies strings that have a special meaning and will
therefore be treated differently from other strings of characters. The
list 'special' is a list of all of the built in Scheme functions which
disrupt the flow of the interpretation. All functions, including
addition, subtraction, etc. will be treated in the same way. But our
interpreter will treat these 'special' functions differently.</p>
<p>Next, we introduce my class of exceptions and some instances of that
class:</p>
<table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre> 1
2
3
4
5
6
7
8
9
10
11</pre></div></td><td class="code"><div class="highlight"><pre><span class="c"># ------ EXCEPTIONS ------</span>
<span class="k">class</span> <span class="nc">MyError</span><span class="p">(</span><span class="ne">Exception</span><span class="p">):</span>
<span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">msg</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">msg</span> <span class="o">=</span> <span class="n">msg</span>
<span class="n">if_error</span> <span class="o">=</span> <span class="n">MyError</span><span class="p">(</span><span class="s">"Error: you need to specify by a consequence and an alternate."</span><span class="p">)</span>
<span class="n">dict_error</span> <span class="o">=</span> <span class="n">MyError</span><span class="p">(</span><span class="s">"Error: can't find element in dictionary. Improper input."</span><span class="p">)</span>
<span class="n">first_error</span> <span class="o">=</span> <span class="n">MyError</span><span class="p">(</span><span class="s">'Error: unexpectedly entered parse function with no input.'</span><span class="p">)</span>
<span class="n">too_many</span> <span class="o">=</span> <span class="n">MyError</span><span class="p">(</span><span class="s">"Error: too many arguments were inputed."</span><span class="p">)</span>
<span class="n">quote_error</span> <span class="o">=</span> <span class="n">MyError</span><span class="p">(</span><span class="s">'Error: quote only takes one operand.'</span><span class="p">)</span>
<span class="n">let_error</span> <span class="o">=</span> <span class="n">MyError</span><span class="p">(</span><span class="s">'Error: let must be followed by a list.'</span><span class="p">)</span>
</pre></div>
</td></tr></table>
<p>Then, we start doing some of the gritty work: tokenizing and parsing. I
have implemented both my tokenizer and my parser as functions. When we
call them, the output of the tokenizer will be fed as input to my
parser. Here is my tokenizer:</p>
<table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre>1
2
3
4
5
6
7</pre></div></td><td class="code"><div class="highlight"><pre><span class="c"># ------ TOKENIZE ------- </span>
<span class="k">def</span> <span class="nf">tokenizer</span><span class="p">(</span><span class="n">holder</span><span class="p">):</span>
<span class="n">holder</span> <span class="o">=</span> <span class="s">'('</span> <span class="o">+</span> <span class="n">holder</span> <span class="o">+</span> <span class="s">')'</span>
<span class="n">holder</span> <span class="o">=</span> <span class="n">holder</span><span class="o">.</span><span class="n">replace</span><span class="p">(</span> <span class="s">'('</span> <span class="p">,</span> <span class="s">' ( '</span> <span class="p">)</span>
<span class="n">holder</span> <span class="o">=</span> <span class="n">holder</span><span class="o">.</span><span class="n">replace</span><span class="p">(</span> <span class="s">')'</span><span class="p">,</span> <span class="s">' ) '</span> <span class="p">)</span>
<span class="n">final</span> <span class="o">=</span> <span class="nb">filter</span><span class="p">(</span><span class="k">lambda</span> <span class="n">a</span><span class="p">:</span> <span class="n">a</span> <span class="o">!=</span> <span class="s">''</span><span class="p">,</span> <span class="n">holder</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s">' '</span><span class="p">))</span>
<span class="k">return</span> <span class="n">final</span>
</pre></div>
</td></tr></table>
<p>The tokenizer takes my user's raw input - something like <code>(+ 2 2)</code>. It
begins by putting extra braces around the entire thing. It then adds
space around all the braces before splitting the input into an array
where each element is a separate token. The input splits over spaces,
which is why it was necessary to add extra space around the braces. far
so good, but why did I begin by adding extra braces?! Seems kind of
wierd, right?</p>
<p>The reason for this seemingly cooky choice was to accomodate inputs like
the following: <code>(define x 3) x</code>. This is a valid Scheme expression which
evaluates to 3. Suppose that I do not add extra brackets around the
entire statement. Then, when I go to actually <em>interpret</em> the input, it
becomes difficult to know when the end of the user input has been
reached. In fact, I did NOT initially add these extra brackets in my
tokenization, but my interpreter evaluated <code>(define x 3) x</code> to be <code>None</code>
instead of <code>3</code>. It simply stopped evaluating after <code>(define x 3)</code>.
Knowing when the end of the user input has been reached, however, is
quite easy if we have an outter set of brackets enclosing everything. Of
course, adding extra brackets came with its own set of issues. I needed
to make sure that my interpreter could differentiate between expressions
like <code>(3)</code> and <code>3</code>. Although '3' is a valid Scheme expression, <code>(3)</code> is
not, and I was essentially changing my input to always look like <code>(3)</code>.
Thanks to Mary Cook, I came up with a pretty clever solution to this
issue, which we will come accross later.</p>
<p>Continuing on... the parser! Writing this parser was my first encouter
with recursion! Here's the code:</p>
<table class="highlighttable"><tr><td class="linenos"><div class="linenodiv"><pre> 1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23</pre></div></td><td class="code"><div class="highlight"><pre><span class="c"># ---- PARSER ----- </span>
<span class="k">def</span> <span class="nf">parse</span><span class="p">(</span><span class="n">tokens</span><span class="p">):</span>
<span class="k">if</span> <span class="nb">len</span><span class="p">(</span><span class="n">tokens</span><span class="p">)</span> <span class="o">==</span> <span class="mi">0</span><span class="p">:</span>
<span class="k">raise</span> <span class="n">first_error</span>
<span class="n">token</span> <span class="o">=</span> <span class="n">tokens</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span>
<span class="n">tokens</span> <span class="o">=</span> <span class="n">tokens</span><span class="p">[</span><span class="mi">1</span><span class="p">:]</span>
<span class="k">if</span> <span class="n">token</span> <span class="o">==</span> <span class="s">'('</span><span class="p">:</span>
<span class="n">parsed_input</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">while</span> <span class="n">tokens</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">!=</span> <span class="s">')'</span><span class="p">:</span>
<span class="n">to_append</span><span class="p">,</span> <span class="n">tokens</span> <span class="o">=</span> <span class="n">parse</span><span class="p">(</span><span class="n">tokens</span><span class="p">)</span>
<span class="n">parsed_input</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">to_append</span><span class="p">)</span>
<span class="n">tokens</span> <span class="o">=</span> <span class="n">tokens</span><span class="p">[</span><span class="mi">1</span><span class="p">:]</span> <span class="c"># pops off the ')' part</span>
<span class="c">#we add a condition to check if last return</span>
<span class="k">if</span> <span class="nb">len</span><span class="p">(</span><span class="n">tokens</span><span class="p">)</span> <span class="o">></span> <span class="mi">0</span><span class="p">:</span>
<span class="k">return</span> <span class="n">parsed_input</span><span class="p">,</span> <span class="n">tokens</span> <span class="c"># if not last return, return current version of holder</span>
<span class="k">elif</span> <span class="nb">len</span><span class="p">(</span><span class="n">tokens</span><span class="p">)</span> <span class="o">==</span> <span class="mi">0</span> <span class="p">:</span>
<span class="k">return</span> <span class="n">parsed_input</span> <span class="c"># if last return, only return new_holder </span>
<span class="k">else</span><span class="p">:</span>
<span class="k">return</span> <span class="n">check_type</span><span class="p">(</span><span class="n">token</span><span class="p">),</span> <span class="n">tokens</span>
</pre></div>
</td></tr></table>
<p>To better understand this function, lets look at how
<code>parse(['(', '(', +, '1', '1', ')', ')'])</code> would be processed, line by
line. Line 3 is not satisfied, so we skip to line 6 and 7. Here Token
will be set to <code>'('</code>, tokens will become
<code>[ '(', '+', '1', '1', ')', ')']</code>. Since the condition on line 9 is
satisfied, we then enter this branch and continue on with line 10 where
<code>parsed_input</code> is initialized as an empty list. Then we enter the while
loop on line 12, and in line 13 we recurse back into our parse function
by calling <code>parse(tokens)</code>.</p>
<p>Now in the second level of our parse function. In this level we execute
line 6 and 7 again, setting <code>token</code> as <code>'('</code>, and <code>tokens</code> to
<code>['+', '1', '1',')', ')']</code>. Since <code>token</code> is still <code>'('</code>, line 10 will
again be executed, creating another <code>parsed_input</code> list, and then diving
into our 3rd level of recursion on <code>parse</code>.</p>
<p>In this 3rd level, token is <code>'+'</code> and tokens is <code>['1', '1', ')', ')']</code>.
Since token is not <code>'('</code> we skip down to the else condition and execute
line 23. Since we have yet to define the function check_type, just
pretend that this line returns token and tokens. Having returned those
values, we come out of the 3rd layer of recursion and back into the 2nd,
at line 13. Here, <code>to_append</code> is set to '+', and tokens is
<code>['1', '1', ')', ')']</code>. Line 14 is then executed, and <code>parsed_input</code>
becomes <code>['+']</code>. We continue looping through this while loop, going into
a 3rd layer of recursion, coming back into two, and appending
<code>parsed_input</code> until <code>parsed_input</code> looks like ['+', '1', '1'] in our
second layer of recursion. At this point the tokens[0] will be ')', so
we break out of the while loop. Line 15 remouves ')' from tokens,
leaving tokens as [')']. Then since the length of tokens is positive
line 19 is executed.</p>
<p>Now we exit the second layer of recursion and are back into the first
parse call. <code>to_append</code> becomes <code>['+', '1', '1']</code>, and <code>tokens</code> is
<code>[')']</code>. The original <code>parsed_input</code> is updated to <code>[['+', '1', '1']]</code>,
the while loop ends, and line 15 is executed, changing <code>tokens</code> to <code>[]</code>.
Since the <code>len(tokens)</code> is zero line 20 is executed and <code>parsed_input</code>
is the final return. We have exited all levels of the parse function,
effectively parsing the input!</p>
<p>This blog post will be continued... I have written enough for one day.</p>
<p class="subheader">Category: <a href="http://lmontopo.github.io/category/blog.html">Blog</a>
</p>
</article>
</div>
<!-- End Main Content -->
<!-- Sidebar -->
<aside class="large-3 columns">
<h5 class="sidebar-title">Site</h5>
<ul class="side-nav">
<li><a href="http://lmontopo.github.io/archives.html">Archives</a>
<li><a href="http://lmontopo.github.io/tags.html">Tags</a>
</ul>
<h5 class="sidebar-title">Categories</h5>
<ul class="side-nav">
<li><a href="http://lmontopo.github.io/category/blog.html">Blog</a></li>
</ul>
</aside> <!-- End Sidebar -->
</div> <!-- End Main Content and Sidebar -->
<!-- Footer -->
<footer class="row">
<div class="large-12 columns">
<hr />
<div class="row">
<div class="large-6 columns">
<!-- <p>The L Blog by Leta Montopoli</p> -->
</div>
</div>
</div>
</footer>
</body>
</html>