Skip to content

Abstract Syntax Trees

Slava Akhmechet edited this page Oct 13, 2006 · 1 revision

The first thing I did to kick off the interpreter project was to determine its scope. I decided that I would only deal with integers, symbols, functions, and lists - a bare minimum required to make something that resembles Lisp. At this point I needed to convert these concepts into code Haskell would understand - exactly the same thing I'd do with any other language. Immediately I was rewarded. I was able to create a type that supported a required data structure in only four lines of Haskell code! The definition was encoded into what Haskell calls an algebraic data type and looked like this:

data Expr = BlaiseInt Integer |
            BlaiseSymbol String |
            BlaiseFn ([Expr]->Expr) |
            BlaiseList [Expr]

Algebraic data types are tricky beasts. Because they're very different from everything people with imperative programming backgrounds are used to, they tend to cause much confusion for a long time until intuitive understanding finally develops. Above code simply states than an expression may be and integer, a symbol, a function, or a list, and each version contains a single member field, respectively an integer, a string, a function that takes a list of expressions and returns an expression, and a list of expressions. The closest (very lossy) translation to Java that I can think of is this:

abstract class Expr {}

final class BlaiseInt extends Expr {
    public int _value;
}

final class BlaiseSymbol extends Expr {
    public String _value;
}

interface BlaiseFunction {
    public Expr evaluateFunction(List arguments);
}

final class BlaiseFn extends Expr {
    BlaiseFunction _value;
}

final class BlaiseList extends Expr {
    List _value;
}

Notice how the Java version is already much longer than the Haskell one despite the fact that I took obvious shortcuts and avoided writing constructors and accessor methods. Additionally, as we continue developing the interpreter, extending the Java translation appropriately will become a very costly and verbose task. I will attempt to demonstrate what I mean at various points in the article. One example I can provide immediately is actually instantiating the data structure. For example, suppose I want to use it to create a list of three integers. In Haskell I can do this:

myList = BlaiseList [BlaiseInt 1, BlaiseInt 2, BlaiseInt 3]

A Java alternative would look like this (we'll even give it an advantage, assume BlaiseList and BlaiseInt have appropriate constructors):

BlaiseList myList = new BlaiseList();
myList._value.add(new BlaiseInt(1));
myList._value.add(new BlaiseInt(2));
myList._value.add(new BlaiseInt(3));

While we could (unnecessarily) transform the above Haskell code to take up four lines, the clarity is already starting to speak for itself: you can do more with Haskell in less code that's easier to read and maintain in the long run.

Clone this wiki locally