Skip to content

04 Grammar Syntax

github-actions[bot] edited this page Nov 25, 2025 · 1 revision

Chapter 4: Grammar Syntax

Zyn uses a PEG (Parser Expression Grammar) syntax compatible with the Pest parser generator. This chapter covers all grammar constructs in detail.

Rule Definitions

Basic Rules

// Normal rule - creates a parse node, handles whitespace between elements
rule_name = { pattern }

// Atomic rule - no whitespace handling, treats content as single token
identifier = @{ ASCII_ALPHA ~ (ASCII_ALPHANUMERIC | "_")* }

// Silent rule - matches but doesn't create a node
paren_expr = _{ "(" ~ expr ~ ")" }

When to Use Each Type

Rule Type Use Case Example
Normal { } Compound syntax structures if_stmt = { "if" ~ expr ~ block }
Atomic @{ } Tokens, literals, identifiers integer = @{ ASCII_DIGIT+ }
Silent _{ } Grouping without AST nodes paren_expr = _{ "(" ~ expr ~ ")" }

Sequence and Choice

Sequence (~)

Matches patterns in order:

// Matches: "if" followed by expression followed by block
if_stmt = { "if" ~ expr ~ block }

// With optional parts
if_else = { "if" ~ expr ~ block ~ ("else" ~ block)? }

Ordered Choice (|)

Tries alternatives in order, takes first match:

// IMPORTANT: Order matters! Longer matches should come first
statement = { if_stmt | while_stmt | return_stmt | expr_stmt }

// Wrong order - "if" would match before "ifeq"
// keyword = { "if" | "ifeq" }  // BAD

// Correct order
keyword = { "ifeq" | "if" }     // GOOD

Repetition

Zero or More (*)

// Matches: "", "a", "aa", "aaa", ...
statements = { statement* }

// With separator
args = { expr ~ ("," ~ expr)* }

One or More (+)

// Matches: "1", "12", "123", ...
digits = @{ ASCII_DIGIT+ }

// At least one statement required
block = { "{" ~ statement+ ~ "}" }

Optional (?)

// Optional else branch
if_stmt = { "if" ~ expr ~ block ~ else_branch? }

// Optional trailing comma
list = { "[" ~ (expr ~ ("," ~ expr)* ~ ","?)? ~ "]" }

Predicates

Negative Lookahead (!)

Succeeds if pattern does NOT match (doesn't consume input):

// Identifier that's not a keyword
identifier = @{ !keyword ~ ASCII_ALPHA ~ (ASCII_ALPHANUMERIC | "_")* }

// Keyword must not be followed by alphanumeric
keyword = @{ ("if" | "else" | "while") ~ !ASCII_ALPHANUMERIC }

Positive Lookahead (&)

Succeeds if pattern matches (doesn't consume input):

// Match only if followed by "("
function_name = { identifier ~ &"(" }

Character Classes

Built-in Classes

ANY                 // Any single character
ASCII              // Any ASCII character (0x00-0x7F)
ASCII_DIGIT        // 0-9
ASCII_NONZERO_DIGIT // 1-9
ASCII_ALPHA        // a-z, A-Z
ASCII_ALPHANUMERIC // a-z, A-Z, 0-9
ASCII_ALPHA_LOWER  // a-z
ASCII_ALPHA_UPPER  // A-Z
ASCII_HEX_DIGIT    // 0-9, a-f, A-F
ASCII_OCT_DIGIT    // 0-7
ASCII_BIN_DIGIT    // 0-1
NEWLINE            // \n or \r\n

Custom Ranges

// Character range
lowercase = { 'a'..'z' }

// Multiple ranges
hex_digit = { '0'..'9' | 'a'..'f' | 'A'..'F' }

String Matching

Exact Match

// Case-sensitive literal
if_keyword = { "if" }

// Multi-character operators
arrow = { "->" }
fat_arrow = { "=>" }

Case Insensitive

// Matches "if", "IF", "If", "iF"
if_keyword = { ^"if" }

Special Rules

WHITESPACE

Defines what counts as whitespace. Normal rules automatically skip this between elements:

WHITESPACE = _{ " " | "\t" | "\n" | "\r" }

COMMENT

Defines comment syntax. Comments are skipped like whitespace:

// Single-line comments
COMMENT = _{ "//" ~ (!"\n" ~ ANY)* ~ "\n"? }

// Block comments (non-nested)
COMMENT = _{ "/*" ~ (!"*/" ~ ANY)* ~ "*/" }

// Both styles
COMMENT = _{
    "//" ~ (!"\n" ~ ANY)* ~ "\n"?
  | "/*" ~ (!"*/" ~ ANY)* ~ "*/"
}

SOI and EOI

Start and End of Input markers:

program = { SOI ~ declarations ~ EOI }

Handling Precedence

PEG handles operator precedence through grammar structure, not precedence tables.

Left-Associative Operators

Build a chain of rules from lowest to highest precedence:

// Lowest precedence: logical OR
expr = { logical_or }

logical_or = { logical_and ~ ("or" ~ logical_and)* }
  -> TypedExpression {
      "fold_binary": { "operand": "logical_and", "operator": "or" }
  }

logical_and = { comparison ~ ("and" ~ comparison)* }
  -> TypedExpression {
      "fold_binary": { "operand": "comparison", "operator": "and" }
  }

comparison = { addition ~ (("==" | "!=" | "<" | ">") ~ addition)* }
  -> TypedExpression {
      "fold_binary": { "operand": "addition", "operator": "==|!=|<|>" }
  }

addition = { multiplication ~ (("+" | "-") ~ multiplication)* }
  -> TypedExpression {
      "fold_binary": { "operand": "multiplication", "operator": "+|-" }
  }

multiplication = { unary ~ (("*" | "/") ~ unary)* }
  -> TypedExpression {
      "fold_binary": { "operand": "unary", "operator": "*|/" }
  }

// Highest precedence: unary then atoms
unary = { ("-" | "!") ~ unary | atom }

atom = { number | identifier | "(" ~ expr ~ ")" }

Right-Associative Operators

Use recursion for right-associativity:

// Right-associative: a = b = c parses as a = (b = c)
assignment = { identifier ~ "=" ~ assignment | expr }

Common Patterns

Lists with Separators

// Comma-separated, no trailing
args = { expr ~ ("," ~ expr)* }

// Comma-separated with optional trailing
items = { item ~ ("," ~ item)* ~ ","? }

// Empty allowed
opt_args = { (expr ~ ("," ~ expr)*)? }

Keyword Protection

Prevent identifiers from matching keywords:

keyword = @{
    ("if" | "else" | "while" | "for" | "return" | "fn" | "const" | "var")
    ~ !(ASCII_ALPHANUMERIC | "_")
}

identifier = @{ !keyword ~ ASCII_ALPHA ~ (ASCII_ALPHANUMERIC | "_")* }

String Literals with Escapes

string_literal = @{ "\"" ~ string_inner* ~ "\"" }

string_inner = {
    !("\"" | "\\") ~ ANY  // Any char except quote or backslash
  | escape_seq
}

escape_seq = { "\\" ~ ("n" | "r" | "t" | "\\" | "\"" | "0") }

Numbers

// Integer with optional sign
integer = @{ "-"? ~ ASCII_DIGIT+ }

// Float
float = @{ "-"? ~ ASCII_DIGIT+ ~ "." ~ ASCII_DIGIT+ ~ exponent? }
exponent = @{ ("e" | "E") ~ ("+" | "-")? ~ ASCII_DIGIT+ }

// Hex literal
hex = @{ "0x" ~ ASCII_HEX_DIGIT+ }

Debugging Tips

Ambiguity

If parsing is slow or incorrect, check for:

  1. Left recursion - PEG doesn't support it

    // BAD: Left recursion
    expr = { expr ~ "+" ~ term | term }
    
    // GOOD: Use repetition
    expr = { term ~ ("+" ~ term)* }
    
  2. Ambiguous choices - First match wins

    // Matches "ifx" as "if" then "x"
    stmt = { "if" | identifier }
    
    // Use negative lookahead
    if_kw = @{ "if" ~ !ASCII_ALPHANUMERIC }
    

Testing Rules

Test individual rules by making them the entry point:

# Test just the expression rule
echo "1 + 2 * 3" | zyntax parse --grammar calc.zyn --rule expr

Next Steps

Now that you understand grammar syntax:

  • Chapter 5: Learn how to attach AST-building actions to rules
  • Chapter 8: See these patterns applied in a real language

Clone this wiki locally