-
Notifications
You must be signed in to change notification settings - Fork 4
Parsing
We're now able to create Haskell data structures that represent Lisp's abstract syntax trees and print them as s-expressions. The next logical step is to write a parser so we can build these structures using Lisp's infamous parentheses. One good thing about writing a Lisp interpreter is that the parser is very simple - most of the time we can write it by hand without resorting to advanced parsing tools. This approach has obvious benefits (no separate build steps to generate code, no other tool to learn, ability to easily debug the parser without trying to understand generated code, etc.) and I took it when I wrote Lisp interpreters in C and Java. My initial approach to writing a parser in Haskell was to do the same thing - write it by hand without resorting to parser generator tools. However, very quickly I realized that Haskell is different enough to give me the best of both worlds.
Haskell comes standard with a parsing library called Parsec which implements a domain specific language for parsing text. If you're familiar with Boost Spirit library written in C++ you'll understand what I mean. The library "embeds" a parsing language into Haskell. The user of the library can specify a parser's grammar directly in Haskell code! Parsec takes some time to get used to, but once you understand it you can drop the conventional parser generators forever. To give you a bit of flavor for Parsec, here's a code snippet that parses a symbol:
parseSymbol = do f <- firstAllowed
r <- many (firstAllowed <|> digit)
return $ BlaiseSymbol (f:r)
where firstAllowed = oneOf "+-*/" <|> letter
The code is rather descriptive. It first looks for the first allowed character (traditionally no digits) followed by many of these same characters or digits. You can see the rest of the grammar in the source code of the interpreter. This is but one example of how domain specific languages eliminate crude tools, extra build steps, debugging pain, and repetitive work. Haskell is an excellent host for domain specific languages - perhaps as close to Lisp as any language can be. A word of warning: using JavaCC after learning Parsec becomes a very unnerving experience.