Given a context-free grammar, parse input data to get a parse tree. Supports unparsing, with perfect roundtripping, depending on the implementation for the specific format.
Implements the AbstractTrees interface.
Some dependent packages:
-
Implementations for specific formats:
-
JSON: ParseUnparseJSON
-
WKT-CRS: ParseUnparseWKTCRS
-
The Dyck language is the set of balanced strings of parentheses. Example words in the Dyck language: ()
, (())
, ()()
, ()(())
. Here's how to get a parser for the Dyck language:
using ParseUnparse.ContextFreeGrammarUtil
using ParseUnparse.SymbolGraphs
using ParseUnparse.AbstractParserIdents
struct DyckParserId <: AbstractParserIdent end # necessary so the Dyck language could be dispatched on
function AbstractParserIdents.get_lexer(::DyckParserId) # trivial tokenizer
function lexer(iterator)
function f(c::Char)
(nothing, c) # `c` is the grammar symbol kind in this case
end
Iterators.map(f, iterator)
end
end
function AbstractParserIdents.get_token_grammar(::DyckParserId) # an LL(1) grammar for the Dyck language, with a single nonterminal and two production rules
start_symbol = 'S'
rules = (
('S' => Set(([], ['(', 'S', ')', 'S']))),
)
(start_symbol, Dict{Char, Set{Vector{Char}}}(rules))
end
function AbstractParserIdents.get_token_parser(id::DyckParserId) # strong-LL(1) parser
(start_symbol, grammar) = get_token_grammar(id)
tables = make_parsing_table_strong_ll_1(grammar, start_symbol)
NoDebug = Nothing # disable debugging
Token = Nothing # our tokenizer is trivial
StrongLL1TableDrivenParser{NoDebug, Token}(start_symbol, tables...)
end
While this simple example uses a trivial lexer/tokenizer, in practice the lexer would:
-
Do actual tokenization, coalescing certain input symbols into tokens (terminal symbols of the grammar).
-
Include source/debugging info, such as on the position of the token within the complete input source.
Regarding the expression 'S' => Set(([], ['(', 'S', ')', 'S']))
above, it defines the production rules of our context-free grammar for the Dyck language. Try the same grammar out in Grammophone, a grammar playground on the Web:
The input in this case is an iterator with Char
elements. Such as a String
. Try the parser out in the REPL:
julia> parser = get_parser(DyckParserId());
julia> (tree, error_status) = parser("");
julia> isempty(error_status) # the parser accepts the empty string
true
julia> (tree, error_status) = parser("())");
julia> isempty(error_status) # the parser rejects the unbalanced string
false
julia> using AbstractTrees: print_tree # let's see a nontrivial parse tree!
julia> function print_tree_map(io::IO, tree)
g = tree.graph
kind = root_symbol_kind(g)
if root_is_terminal(g)
show(io, (kind, root_token(g))) # a terminal symbol may have extra info (although it's just `nothing` in this example)
else
show(io, kind) # a nonterminal symbol just has its symbol kind
end
end
print_tree_map (generic function with 1 method)
julia> print_tree(print_tree_map, stdout, graph_as_tree(parser("(()())")[1]); maxdepth = 100)
'S'
├─ ('(', nothing)
├─ 'S'
│ ├─ ('(', nothing)
│ ├─ 'S'
│ ├─ (')', nothing)
│ └─ 'S'
│ ├─ ('(', nothing)
│ ├─ 'S'
│ ├─ (')', nothing)
│ └─ 'S'
├─ (')', nothing)
└─ 'S'