Skip to content

Conventional Grammars

Dávid Németi edited this page Sep 6, 2013 · 13 revisions

To fully understand what is Sarcasm, its typesafe grammars and its domain-grammar bindings good for, first we have to examine how conventional grammars and AST builders work in Irony.

We are going to use the same simple expression domain as in the introduction, so we are not going to repeat it here.

So let's see how you can define a grammar and AST builder code for this domain in Irony:

using System;
using Irony;
using Irony.Ast;
using Irony.Parsing;

using D = MyExpression.DomainDefinitions;

namespace MyExpression
{
    public class ConventionalGrammar : Irony.Parsing.Grammar
    {
        public ConventionalGrammar()
        {
            // definitions of terminals

            KeyTerm ADD_OP = ToTerm("+");
            KeyTerm SUB_OP = ToTerm("-");
            KeyTerm MUL_OP = ToTerm("*");
            KeyTerm DIV_OP = ToTerm("/");
            KeyTerm POW_OP = ToTerm("^");

            KeyTerm LEFT_PAREN = ToTerm("(");
            KeyTerm RIGHT_PAREN = ToTerm(")");

            Func<ParseTreeNode, D.BinaryOperator> GetBinaryOperator =
                parseTreeNode =>
                {
                    if (parseTreeNode.Term == ADD_OP)
                        return D.BinaryOperator.Add;

                    else if (parseTreeNode.Term == SUB_OP)
                        return D.BinaryOperator.Sub;

                    else if (parseTreeNode.Term == MUL_OP)
                        return D.BinaryOperator.Mul;

                    else if (parseTreeNode.Term == DIV_OP)
                        return D.BinaryOperator.Div;

                    else if (parseTreeNode.Term == POW_OP)
                        return D.BinaryOperator.Pow;

                    else
                        throw new ArgumentException();
                };

            // definitions of nonterminals

            var expression = new NonTerminal("expression");
            MarkTransient(expression);

            var binaryExpression = new NonTerminal("binaryExpression",
                (context, parseNode) => parseNode.AstNode = new D.BinaryExpression()
                    {
                        Term1 = (D.Expression)parseNode.ChildNodes[0].AstNode,
                        Op = (D.BinaryOperator)parseNode.ChildNodes[1].AstNode,
                        Term2 = (D.Expression)parseNode.ChildNodes[2].AstNode
                    }
                );

            var numberLiteral = new NonTerminal("numberLiteral",
                (context, parseNode) => parseNode.AstNode = new D.NumberLiteral()
                    { Value = (decimal)parseNode.ChildNodes[0].AstNode }
                );

            var binaryOperator = new NonTerminal("binaryOperator",
                (context, parseNode) => parseNode.AstNode = GetBinaryOperator(parseNode.ChildNodes[0]));

            // syntax rules

            expression.Rule =
                binaryExpression |
                numberLiteral |
                LEFT_PAREN + expression + RIGHT_PAREN
                ;

            binaryExpression.Rule =
                expression
                + binaryOperator
                + expression
                ;

            numberLiteral.Rule = new Irony.Parsing.NumberLiteral("numberLiteral", NumberOptions.Default,
                (context, parseNode) => parseNode.AstNode = (decimal)parseNode.Token.Value);

            binaryOperator.Rule = ADD_OP | SUB_OP | MUL_OP | DIV_OP | POW_OP;

            // operator precedences and associativities

            RegisterOperators(10, ADD_OP, SUB_OP);
            RegisterOperators(20, MUL_OP, DIV_OP);
            RegisterOperators(30, Associativity.Right, POW_OP);

            MarkPunctuation(LEFT_PAREN, RIGHT_PAREN);
        }
    }
}

Note that when defining your terminals and nonterminals (in Irony they are called bnfterms altogether) you have to implement the AST builder code as well (if you want to build an AST, of course).

Weak points

Now there are two main weak points here. Let's take binaryExpression for example:

binaryExpression.Rule = expression + binaryOperator + expression;

What we see here is that at the rule definition we have all the information about the child nodes, specified by the bnfterms (including their natural order) at the right of the rule. We know that the 0th bnfterm is an expression, the 1th bnfterm is a binary operator and the 2nd bnfterm is another expression. However, we lose all this information, and implement the AST builder so that we are doubling this information. Doubling this information makes this solution redundant, and redundancy needs proper synchronization between the grammar rule and the AST builder code in order to avoid inconsistency.

The parser, when parses an expression followed by a binary operator followed by another expression, is going to create a parse tree where the 0th child node is the expression at the left, the 1st child node is the binary operator and the 2nd child node is the expression at the right.

Let's see the AST builder code:

var binaryExpression = new NonTerminal("binaryExpression", (context, parseNode) => parseNode.AstNode =
    new D.BinaryExpression() { Term1 = (D.Expression)parseNode.ChildNodes[0].AstNode, ...

Weak point #1: Indices

We have to use indices to access child node in the parse tree, although for a specific nonterminal the location of the child nodes in the parse tree has been already specified by the rule of this nonterminal. So we assume that the expression that will be the value of Term1 is located at the 0th child node.

If we change the rule so that it affects the order of bnfterms, then we also need to change AST builder code. We are not protected from typos either.

// run time error (index error: ArgumentOutOfRangeException)
var binaryExpression = new NonTerminal("binaryExpression", (context, parseNode) => parseNode.AstNode =
    new D.BinaryExpression() { Term1 = (D.Expression)parseNode.ChildNodes[5].AstNode, ...
// run time error when building the AST
KeyTerm BIN_OP_PREFIX = ToTerm("operator");
binaryExpression.Rule = expression + BIN_OP_PREFIX + binaryOperator + expression;

Weak point #2: No Domain Types

Since we have no domain types in the grammar, we cannot know that the 0th child node is an expression, so we have to assume it and cast it to D.Expression.

Although when we build that AST subtree for the 0th child node, e.g. when we created a D.BinaryExpression or D.NumberLiteral object, then we have its type in our hand, and knew that its type is D.Expression. But then we lost its type and now we are dealing with objects in the parse tree, and all we know about them that they types are System.Object.

// run time error (domain type error: InvalidCastException)
new D.BinaryExpression() { Term1 = (D.Expression)parseNode.ChildNodes[1].AstNode, ...
// run time error when building the AST using this as child node (domain type error: InvalidCastException)
var binaryExpression = new NonTerminal("binaryExpression", (context, parseNode) =>
    parseNode.AstNode = D.BinaryOperator.Add);

To see how Sarcasm can eliminate these weak points, continue with Domain Bound Grammars.

Clone this wiki locally