A Monadic Parser Combinator for Python.
Parsec provides a declarative, modular way to build complex text parsers. By leveraging the powerful expressiveness of Monads, you can compose simple parsers like building blocks to handle complex grammar structures, while elegantly managing state and errors during the parsing process.
Unlike traditional parser generation tools (like Lark, PLY), Parsec allows you to define and combine your parsing logic directly within Python code, without the need for separate grammar files, making parser construction more flexible and dynamic.
- Monadic A Parser is a monad.
- Declarative Declarative grammar definition
- Operator Operator-based combinators(
<<,>>,&,|,/,@) - Lazy Evalution Lazy evaluation for recursion
- Curried Curried functional interfaces
- Typed Support type inference
For full type support, Python 3.13+ is required.
Install from source code:
git clone https://github.com/lunexnocty/parsec-python.git
cd parsec-python
pip install .Install from pypi:
pip install parsec-pythonIf you use uv:
uv add parsec-pythonThe parsec.text module provides a set of fundamental text parsers, such as number, char, blank, etc. The parsec.combinator module offers a rich, curried functional interface for combinators, enabling you to compose these basic parsers into powerful and expressive parsing logic.
To make parser composition even more intuitive, several operators have been overloaded:
p << fequivalent tof(p), enables successive application of combinators to a parser, supporting a piping style of composition.p >> fequivalent top.bind(f), represents the monadic bind operation, allowing the result of parser p to determine and sequence the next parser generated by function f. This enables context-sensitive and dependent parsing, as found in the Monad interface in functional programming.p @ fequivalent top.map(f), mapping thefover the result of the parserp. This corresponds to the Functor'smap(orfmap) operation, allowing you to transform the output of a parser in a declarative and compositional way.- The
&operator combines multiple parsers in sequence and collects their results into a tuple. - The
|operator tries each parser in sequence and returns the first successful result. - The
/operator is similar to|, but never backtracking.
Parsers can also be defined lazily to support recursive grammar definitions—essential for constructs like nested expressions or parentheses.
Below is an arithmetic expression grammar supporting operator precedence, parentheses, and left-associative chaining:
expr := <add_or_sub_exp>
add_or_sub_exp := <mul_or_div_exp> (('+' | '-') <mul_or_div_exp>)*
mul_or_div_exp := <factor> (('*' | '/') <factor>)*
factor := '(' <expr> ')' | <num>
num := { number }Based on the BNF description above, we can easily build a parser to analyze the language defined by that syntax.
from parsec import Parser, item, text
from parsec.text import lex
from parsec.utils import curry
def calc(op: str):
@curry
def _(x: int | float, y: int | float):
return {'+': x + y, '-': x - y, '*': x * y, '/': x / y}[op]
return _
expr = Parser[str, int | float]()
factor = expr.between(lex.l_round, lex.r_round) | lex.number
mul_or_div = lex.char('*') | lex.char('/') # operator `|`
mul_or_div_op = mul_or_div @ calc # operator `@`
mul_or_div_expr = factor.chainl1(mul_or_div_op) # left associativity
add_or_sub = item.range('+-') << lex.lexeme(text.blank) # `range` combinator
add_or_sub_op = add_or_sub @ calc
add_or_sub_expr = mul_or_div_expr.chainl1(add_or_sub_op)
expr.define(add_or_sub_expr) # lazy definition
src = '(1. + .2e-1) * 100 - 1 / 2.5 '
assert text.parse(expr, src) == eval(src) # TrueThe parsec.combinator.chainl1 combinator handles left-associative chaining of operations, parsing one or more occurrences of a parser p separated by an operator parser, and combining results in a left-associative manner.
This approach is highly extensible: you can add additional operators, functions, or syntax features by composing and reusing combinators.
- For more basic text parsers, see
parsec.text - For more parser combinators, see
parsec.combinator
A parser is a function that takes a Context[I] as input and returns a Result[I, R], where I and R are generic type parameters. Here, I represents the type of each element in the input stream, and R denotes the type of the value produced by the parser.
parser[I, R]: Context[I] -> Result[I, R]The parsing function is wrapped in the Parser[I, R] class, endowing it with a monadic interface for functional composition and declarative parsing.
class Parser[I, R]:
def bind[S](self, fn: R -> Parser[I, S]) -> Parser[I, S]: ...
def okay(self, value: R) -> Parser[I, R]: ...
def fail(self, errpr: ParseErr) -> Parser[I, R]: ...A Context[I] consists of two primary components: a stream[I], which provides access to the underlying input sequence, and a State[I], which manages auxiliary parsing state. If you need to parse data types other than text, you can extend IStream and IState to implement custom stream and state management logic, enabling the parsing of arbitrary data sequences.
class Context[I]:
stream: IStream[I]
state: IState[I]The Result[I, R] type represents the outcome of a parsing operation, containing the updated context, the parsing result (either a successfully parsed value or an error), and the number of input elements consumed during parsing.
class Result[I, R]:
context: Context[I]
outcome: Okay[R] | Fail
consumed: intThe combinator module provides a curried functional interface for composing parsers, and also supports method chaining. Both styles are equivalent in expressive power.
Parsec do not support left-recursive grammars.
