-
Notifications
You must be signed in to change notification settings - Fork 0
04 Grammar Syntax
Zyn uses a PEG (Parser Expression Grammar) syntax compatible with the Pest parser generator. This chapter covers all grammar constructs in detail.
// Normal rule - creates a parse node, handles whitespace between elements
rule_name = { pattern }
// Atomic rule - no whitespace handling, treats content as single token
identifier = @{ ASCII_ALPHA ~ (ASCII_ALPHANUMERIC | "_")* }
// Silent rule - matches but doesn't create a node
paren_expr = _{ "(" ~ expr ~ ")" }
| Rule Type | Use Case | Example |
|---|---|---|
Normal { }
|
Compound syntax structures | if_stmt = { "if" ~ expr ~ block } |
Atomic @{ }
|
Tokens, literals, identifiers | integer = @{ ASCII_DIGIT+ } |
Silent _{ }
|
Grouping without AST nodes | paren_expr = _{ "(" ~ expr ~ ")" } |
Matches patterns in order:
// Matches: "if" followed by expression followed by block
if_stmt = { "if" ~ expr ~ block }
// With optional parts
if_else = { "if" ~ expr ~ block ~ ("else" ~ block)? }
Tries alternatives in order, takes first match:
// IMPORTANT: Order matters! Longer matches should come first
statement = { if_stmt | while_stmt | return_stmt | expr_stmt }
// Wrong order - "if" would match before "ifeq"
// keyword = { "if" | "ifeq" } // BAD
// Correct order
keyword = { "ifeq" | "if" } // GOOD
// Matches: "", "a", "aa", "aaa", ...
statements = { statement* }
// With separator
args = { expr ~ ("," ~ expr)* }
// Matches: "1", "12", "123", ...
digits = @{ ASCII_DIGIT+ }
// At least one statement required
block = { "{" ~ statement+ ~ "}" }
// Optional else branch
if_stmt = { "if" ~ expr ~ block ~ else_branch? }
// Optional trailing comma
list = { "[" ~ (expr ~ ("," ~ expr)* ~ ","?)? ~ "]" }
Succeeds if pattern does NOT match (doesn't consume input):
// Identifier that's not a keyword
identifier = @{ !keyword ~ ASCII_ALPHA ~ (ASCII_ALPHANUMERIC | "_")* }
// Keyword must not be followed by alphanumeric
keyword = @{ ("if" | "else" | "while") ~ !ASCII_ALPHANUMERIC }
Succeeds if pattern matches (doesn't consume input):
// Match only if followed by "("
function_name = { identifier ~ &"(" }
ANY // Any single character
ASCII // Any ASCII character (0x00-0x7F)
ASCII_DIGIT // 0-9
ASCII_NONZERO_DIGIT // 1-9
ASCII_ALPHA // a-z, A-Z
ASCII_ALPHANUMERIC // a-z, A-Z, 0-9
ASCII_ALPHA_LOWER // a-z
ASCII_ALPHA_UPPER // A-Z
ASCII_HEX_DIGIT // 0-9, a-f, A-F
ASCII_OCT_DIGIT // 0-7
ASCII_BIN_DIGIT // 0-1
NEWLINE // \n or \r\n
// Character range
lowercase = { 'a'..'z' }
// Multiple ranges
hex_digit = { '0'..'9' | 'a'..'f' | 'A'..'F' }
// Case-sensitive literal
if_keyword = { "if" }
// Multi-character operators
arrow = { "->" }
fat_arrow = { "=>" }
// Matches "if", "IF", "If", "iF"
if_keyword = { ^"if" }
Defines what counts as whitespace. Normal rules automatically skip this between elements:
WHITESPACE = _{ " " | "\t" | "\n" | "\r" }
Defines comment syntax. Comments are skipped like whitespace:
// Single-line comments
COMMENT = _{ "//" ~ (!"\n" ~ ANY)* ~ "\n"? }
// Block comments (non-nested)
COMMENT = _{ "/*" ~ (!"*/" ~ ANY)* ~ "*/" }
// Both styles
COMMENT = _{
"//" ~ (!"\n" ~ ANY)* ~ "\n"?
| "/*" ~ (!"*/" ~ ANY)* ~ "*/"
}
Start and End of Input markers:
program = { SOI ~ declarations ~ EOI }
PEG handles operator precedence through grammar structure, not precedence tables.
Build a chain of rules from lowest to highest precedence:
// Lowest precedence: logical OR
expr = { logical_or }
logical_or = { logical_and ~ ("or" ~ logical_and)* }
-> TypedExpression {
"fold_binary": { "operand": "logical_and", "operator": "or" }
}
logical_and = { comparison ~ ("and" ~ comparison)* }
-> TypedExpression {
"fold_binary": { "operand": "comparison", "operator": "and" }
}
comparison = { addition ~ (("==" | "!=" | "<" | ">") ~ addition)* }
-> TypedExpression {
"fold_binary": { "operand": "addition", "operator": "==|!=|<|>" }
}
addition = { multiplication ~ (("+" | "-") ~ multiplication)* }
-> TypedExpression {
"fold_binary": { "operand": "multiplication", "operator": "+|-" }
}
multiplication = { unary ~ (("*" | "/") ~ unary)* }
-> TypedExpression {
"fold_binary": { "operand": "unary", "operator": "*|/" }
}
// Highest precedence: unary then atoms
unary = { ("-" | "!") ~ unary | atom }
atom = { number | identifier | "(" ~ expr ~ ")" }
Use recursion for right-associativity:
// Right-associative: a = b = c parses as a = (b = c)
assignment = { identifier ~ "=" ~ assignment | expr }
// Comma-separated, no trailing
args = { expr ~ ("," ~ expr)* }
// Comma-separated with optional trailing
items = { item ~ ("," ~ item)* ~ ","? }
// Empty allowed
opt_args = { (expr ~ ("," ~ expr)*)? }
Prevent identifiers from matching keywords:
keyword = @{
("if" | "else" | "while" | "for" | "return" | "fn" | "const" | "var")
~ !(ASCII_ALPHANUMERIC | "_")
}
identifier = @{ !keyword ~ ASCII_ALPHA ~ (ASCII_ALPHANUMERIC | "_")* }
string_literal = @{ "\"" ~ string_inner* ~ "\"" }
string_inner = {
!("\"" | "\\") ~ ANY // Any char except quote or backslash
| escape_seq
}
escape_seq = { "\\" ~ ("n" | "r" | "t" | "\\" | "\"" | "0") }
// Integer with optional sign
integer = @{ "-"? ~ ASCII_DIGIT+ }
// Float
float = @{ "-"? ~ ASCII_DIGIT+ ~ "." ~ ASCII_DIGIT+ ~ exponent? }
exponent = @{ ("e" | "E") ~ ("+" | "-")? ~ ASCII_DIGIT+ }
// Hex literal
hex = @{ "0x" ~ ASCII_HEX_DIGIT+ }
If parsing is slow or incorrect, check for:
-
Left recursion - PEG doesn't support it
// BAD: Left recursion expr = { expr ~ "+" ~ term | term } // GOOD: Use repetition expr = { term ~ ("+" ~ term)* } -
Ambiguous choices - First match wins
// Matches "ifx" as "if" then "x" stmt = { "if" | identifier } // Use negative lookahead if_kw = @{ "if" ~ !ASCII_ALPHANUMERIC }
Test individual rules by making them the entry point:
# Test just the expression rule
echo "1 + 2 * 3" | zyntax parse --grammar calc.zyn --rule exprNow that you understand grammar syntax: