04 Grammar Syntax

Chapter 4: Grammar Syntax

Zyn uses a PEG (Parser Expression Grammar) syntax compatible with the Pest parser generator. This chapter covers all grammar constructs in detail.

Rule Definitions

Basic Rules

// Normal rule - creates a parse node, handles whitespace between elements
rule_name = { pattern }

// Atomic rule - no whitespace handling, treats content as single token
identifier = @{ ASCII_ALPHA ~ (ASCII_ALPHANUMERIC | "_")* }

// Silent rule - matches but doesn't create a node
paren_expr = _{ "(" ~ expr ~ ")" }

When to Use Each Type

Rule Type	Use Case	Example
Normal `{ }`	Compound syntax structures	`if_stmt = { "if" ~ expr ~ block }`
Atomic `@{ }`	Tokens, literals, identifiers	`integer = @{ ASCII_DIGIT+ }`
Silent `_{ }`	Grouping without AST nodes	`paren_expr = _{ "(" ~ expr ~ ")" }`

Sequence and Choice

Sequence (`~`)

Matches patterns in order:

// Matches: "if" followed by expression followed by block
if_stmt = { "if" ~ expr ~ block }

// With optional parts
if_else = { "if" ~ expr ~ block ~ ("else" ~ block)? }

Ordered Choice (`|`)

Tries alternatives in order, takes first match:

// IMPORTANT: Order matters! Longer matches should come first
statement = { if_stmt | while_stmt | return_stmt | expr_stmt }

// Wrong order - "if" would match before "ifeq"
// keyword = { "if" | "ifeq" }  // BAD

// Correct order
keyword = { "ifeq" | "if" }     // GOOD

Repetition

Zero or More (`*`)

// Matches: "", "a", "aa", "aaa", ...
statements = { statement* }

// With separator
args = { expr ~ ("," ~ expr)* }

One or More (`+`)

// Matches: "1", "12", "123", ...
digits = @{ ASCII_DIGIT+ }

// At least one statement required
block = { "{" ~ statement+ ~ "}" }

Optional (`?`)

// Optional else branch
if_stmt = { "if" ~ expr ~ block ~ else_branch? }

// Optional trailing comma
list = { "[" ~ (expr ~ ("," ~ expr)* ~ ","?)? ~ "]" }

Predicates

Negative Lookahead (`!`)

Succeeds if pattern does NOT match (doesn't consume input):

// Identifier that's not a keyword
identifier = @{ !keyword ~ ASCII_ALPHA ~ (ASCII_ALPHANUMERIC | "_")* }

// Keyword must not be followed by alphanumeric
keyword = @{ ("if" | "else" | "while") ~ !ASCII_ALPHANUMERIC }

Positive Lookahead (`&`)

Succeeds if pattern matches (doesn't consume input):

// Match only if followed by "("
function_name = { identifier ~ &"(" }

Character Classes

Built-in Classes

ANY                 // Any single character
ASCII              // Any ASCII character (0x00-0x7F)
ASCII_DIGIT        // 0-9
ASCII_NONZERO_DIGIT // 1-9
ASCII_ALPHA        // a-z, A-Z
ASCII_ALPHANUMERIC // a-z, A-Z, 0-9
ASCII_ALPHA_LOWER  // a-z
ASCII_ALPHA_UPPER  // A-Z
ASCII_HEX_DIGIT    // 0-9, a-f, A-F
ASCII_OCT_DIGIT    // 0-7
ASCII_BIN_DIGIT    // 0-1
NEWLINE            // \n or \r\n

Custom Ranges

// Character range
lowercase = { 'a'..'z' }

// Multiple ranges
hex_digit = { '0'..'9' | 'a'..'f' | 'A'..'F' }

String Matching

Exact Match

// Case-sensitive literal
if_keyword = { "if" }

// Multi-character operators
arrow = { "->" }
fat_arrow = { "=>" }

Case Insensitive

// Matches "if", "IF", "If", "iF"
if_keyword = { ^"if" }

Special Rules

WHITESPACE

Defines what counts as whitespace. Normal rules automatically skip this between elements:

WHITESPACE = _{ " " | "\t" | "\n" | "\r" }

COMMENT

Defines comment syntax. Comments are skipped like whitespace:

// Single-line comments
COMMENT = _{ "//" ~ (!"\n" ~ ANY)* ~ "\n"? }

// Block comments (non-nested)
COMMENT = _{ "/*" ~ (!"*/" ~ ANY)* ~ "*/" }

// Both styles
COMMENT = _{
    "//" ~ (!"\n" ~ ANY)* ~ "\n"?
  | "/*" ~ (!"*/" ~ ANY)* ~ "*/"
}

SOI and EOI

Start and End of Input markers:

program = { SOI ~ declarations ~ EOI }

Handling Precedence

PEG handles operator precedence through grammar structure, not precedence tables.

Left-Associative Operators

Build a chain of rules from lowest to highest precedence:

// Lowest precedence: logical OR
expr = { logical_or }

logical_or = { logical_and ~ ("or" ~ logical_and)* }
  -> TypedExpression {
      "fold_binary": { "operand": "logical_and", "operator": "or" }
  }

logical_and = { comparison ~ ("and" ~ comparison)* }
  -> TypedExpression {
      "fold_binary": { "operand": "comparison", "operator": "and" }
  }

comparison = { addition ~ (("==" | "!=" | "<" | ">") ~ addition)* }
  -> TypedExpression {
      "fold_binary": { "operand": "addition", "operator": "==|!=|<|>" }
  }

addition = { multiplication ~ (("+" | "-") ~ multiplication)* }
  -> TypedExpression {
      "fold_binary": { "operand": "multiplication", "operator": "+|-" }
  }

multiplication = { unary ~ (("*" | "/") ~ unary)* }
  -> TypedExpression {
      "fold_binary": { "operand": "unary", "operator": "*|/" }
  }

// Highest precedence: unary then atoms
unary = { ("-" | "!") ~ unary | atom }

atom = { number | identifier | "(" ~ expr ~ ")" }

Right-Associative Operators

Use recursion for right-associativity:

// Right-associative: a = b = c parses as a = (b = c)
assignment = { identifier ~ "=" ~ assignment | expr }

Common Patterns

Lists with Separators

// Comma-separated, no trailing
args = { expr ~ ("," ~ expr)* }

// Comma-separated with optional trailing
items = { item ~ ("," ~ item)* ~ ","? }

// Empty allowed
opt_args = { (expr ~ ("," ~ expr)*)? }

Keyword Protection

Prevent identifiers from matching keywords:

keyword = @{
    ("if" | "else" | "while" | "for" | "return" | "fn" | "const" | "var")
    ~ !(ASCII_ALPHANUMERIC | "_")
}

identifier = @{ !keyword ~ ASCII_ALPHA ~ (ASCII_ALPHANUMERIC | "_")* }

String Literals with Escapes

string_literal = @{ "\"" ~ string_inner* ~ "\"" }

string_inner = {
    !("\"" | "\\") ~ ANY  // Any char except quote or backslash
  | escape_seq
}

escape_seq = { "\\" ~ ("n" | "r" | "t" | "\\" | "\"" | "0") }

Numbers

// Integer with optional sign
integer = @{ "-"? ~ ASCII_DIGIT+ }

// Float
float = @{ "-"? ~ ASCII_DIGIT+ ~ "." ~ ASCII_DIGIT+ ~ exponent? }
exponent = @{ ("e" | "E") ~ ("+" | "-")? ~ ASCII_DIGIT+ }

// Hex literal
hex = @{ "0x" ~ ASCII_HEX_DIGIT+ }

Debugging Tips

Ambiguity

If parsing is slow or incorrect, check for:

Left recursion - PEG doesn't support it

// BAD: Left recursion
expr = { expr ~ "+" ~ term | term }

// GOOD: Use repetition
expr = { term ~ ("+" ~ term)* }

Ambiguous choices - First match wins

// Matches "ifx" as "if" then "x"
stmt = { "if" | identifier }

// Use negative lookahead
if_kw = @{ "if" ~ !ASCII_ALPHANUMERIC }

Testing Rules

Test individual rules by making them the entry point:

# Test just the expression rule
echo "1 + 2 * 3" | zyntax parse --grammar calc.zyn --rule expr

Next Steps

Now that you understand grammar syntax:

Chapter 5: Learn how to attach AST-building actions to rules
Chapter 8: See these patterns applied in a real language

04 Grammar Syntax

Chapter 4: Grammar Syntax

Rule Definitions

Basic Rules

When to Use Each Type

Sequence and Choice

Sequence (~)

Ordered Choice (|)

Repetition

Zero or More (*)

One or More (+)

Optional (?)

Predicates

Negative Lookahead (!)

Positive Lookahead (&)

Character Classes

Built-in Classes

Custom Ranges

String Matching

Exact Match

Case Insensitive

Special Rules

WHITESPACE

COMMENT

SOI and EOI

Handling Precedence

Left-Associative Operators

Right-Associative Operators

Common Patterns

Lists with Separators

Keyword Protection

String Literals with Escapes

Numbers

Debugging Tips

Ambiguity

Testing Rules

Next Steps

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally

Sequence (`~`)

Ordered Choice (`|`)

Zero or More (`*`)

One or More (`+`)

Optional (`?`)

Negative Lookahead (`!`)

Positive Lookahead (`&`)