ParseUnparse

Given a context-free grammar, parse input data to get a parse tree. Supports unparsing, with perfect roundtripping, depending on the implementation for the specific format.

Implements the AbstractTrees interface.

Some dependent packages:

Implementations for specific formats:
- JSON: ParseUnparseJSON
- WKT-CRS: ParseUnparseWKTCRS

Usage example: a parser for the Dyck language

The Dyck language is the set of balanced strings of parentheses. Example words in the Dyck language: (), (()), ()(), ()(()). Here's how to get a parser for the Dyck language:

using ParseUnparse.ContextFreeGrammarUtil
using ParseUnparse.SymbolGraphs
using ParseUnparse.AbstractParserIdents

struct DyckParserId <: AbstractParserIdent end  # necessary so the Dyck language could be dispatched on

function AbstractParserIdents.get_lexer(::DyckParserId)  # trivial tokenizer
    function lexer(iterator)
        function f(c::Char)
            (nothing, c)  # `c` is the grammar symbol kind in this case
        end
        Iterators.map(f, iterator)
    end
end

function AbstractParserIdents.get_token_grammar(::DyckParserId)  # an LL(1) grammar for the Dyck language, with a single nonterminal and two production rules
    start_symbol = 'S'
    rules = (
        ('S' => Set(([], ['(', 'S', ')', 'S']))),
    )
    (start_symbol, Dict{Char, Set{Vector{Char}}}(rules))
end

function AbstractParserIdents.get_token_parser(id::DyckParserId)  # strong-LL(1) parser
    (start_symbol, grammar) = get_token_grammar(id)
    tables = make_parsing_table_strong_ll_1(grammar, start_symbol)
    NoDebug = Nothing  # disable debugging
    Token = Nothing  # our tokenizer is trivial
    StrongLL1TableDrivenParser{NoDebug, Token}(start_symbol, tables...)
end

While this simple example uses a trivial lexer/tokenizer, in practice the lexer would:

Do actual tokenization, coalescing certain input symbols into tokens (terminal symbols of the grammar).
Include source/debugging info, such as on the position of the token within the complete input source.

Regarding the expression 'S' => Set(([], ['(', 'S', ')', 'S'])) above, it defines the production rules of our context-free grammar for the Dyck language. Try the same grammar out in Grammophone, a grammar playground on the Web:

https://mdaines.github.io/grammophone/?s=UyAtPiB8IGxlZnQgUyByaWdodCBTIC4=

The input in this case is an iterator with Char elements. Such as a String. Try the parser out in the REPL:

julia> parser = get_parser(DyckParserId());

julia> (tree, error_status) = parser("");

julia> isempty(error_status)  # the parser accepts the empty string
true

julia> (tree, error_status) = parser("())");

julia> isempty(error_status)  # the parser rejects the unbalanced string
false

julia> using AbstractTrees: print_tree  # let's see a nontrivial parse tree!

julia> function print_tree_map(io::IO, tree)
           g = tree.graph
           kind = root_symbol_kind(g)
           if root_is_terminal(g)
               show(io, (kind, root_token(g)))  # a terminal symbol may have extra info (although it's just `nothing` in this example)
           else
               show(io, kind)  # a nonterminal symbol just has its symbol kind
           end
       end
print_tree_map (generic function with 1 method)

julia> print_tree(print_tree_map, stdout, graph_as_tree(parser("(()())")[1]); maxdepth = 100)
'S'
├─ ('(', nothing)
├─ 'S'
│  ├─ ('(', nothing)
│  ├─ 'S'
│  ├─ (')', nothing)
│  └─ 'S'
│     ├─ ('(', nothing)
│     ├─ 'S'
│     ├─ (')', nothing)
│     └─ 'S'
├─ (')', nothing)
└─ 'S'

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
.github		.github
src		src
test		test
.gitignore		.gitignore
LICENSE		LICENSE
Project.toml		Project.toml
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ParseUnparse

Usage example: a parser for the Dyck language

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

JuliaIO/ParseUnparse.jl

Folders and files

Latest commit

History

Repository files navigation

ParseUnparse

Usage example: a parser for the Dyck language

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages