(by @jneen and @laughinghan)
Parsimmon is a small library for writing big parsers made up of lots of little parsers. The API is inspired by parsec and Promises/A.
var regex = Parsimmon.regex;
var string = Parsimmon.string;
var optWhitespace = Parsimmon.optWhitespace;
var lazy = Parsimmon.lazy;
function lexeme(p) { return p.skip(optWhitespace); }
var lparen = lexeme(string('('));
var rparen = lexeme(string(')'));
var expr = lazy('an s-expression', function() { return form.or(atom) });
var number = lexeme(regex(/[0-9]+/).map(parseInt));
var id = lexeme(regex(/[a-z_]\w*/i));
var atom = number.or(id);
var form = lparen.then(expr.many()).skip(rparen);
expr.parse('3').value // => 3
expr.parse('(add (mul 10 (add 3 4)) (add 7 8))').value
// => ['add', ['mul', 10, ['add', 3, 4]], ['add', 7, 8]]
A Parsimmon parser is an object that represents an action on a stream
of text, and the promise of either an object yielded by that action on
success or a message in case of failure. For example, string('foo')
yields the string 'foo'
if the beginning of the stream is 'foo'
,
and otherwise fails.
The combinator method .map
is used to transform the yielded value.
For example,
string('foo').map(function(x) { return x + 'bar'; })
will yield 'foobar'
if the stream starts with 'foo'
. The parser
digits.map(function(x) { return parseInt(x) * 2; })
will yield the number 24 when it encounters the string '12'. The method
.result
can be used to set a constant result.
Calling .parse(str)
on a parser parses the string, and returns an
object with a status
flag, indicating whether the parse succeeded.
If it succeeded, the value
attribute will contain the yielded value.
Otherwise, the index
and expected
attributes will contain the
index of the parse error, and a message indicating what was expected.
The error object can be passed along with the original source to
Parsimmon.formatError(source, error)
to obtain a human-readable
error string.
Parsimmon.string("my-string")
is a parser that expects to find"my-string"
, and will yield the same.Parsimmon.regex(/myregex/, group=0)
is a parser that expects the stream to match the given regex, and yields the given match group, or the entire match.Parsimmon.succeed(result)
is a parser that doesn't consume any of the string, and yieldsresult
.Parsimmon.seq(p1, p2, ... pn)
accepts a variable number of parsers that it expects to find in order, yielding an array of the results.Parsimmon.alt(p1, p2, ... pn)
accepts a variable number of parsers, and yields the value of the first one that succeeds, backtracking in between.Parsimmon.lazy(f)
accepts a function that returns a parser, which is evaluated the first time the parser is used. This is useful for referencing parsers that haven't yet been defined.Parsimmon.lazy(desc, f)
is the same asParsimmon.lazy
but also setsdesc
as the expected value (see.desc()
below)Parsimmon.fail(message)
Parsimmon.letter
is equivalent toParsimmon.regex(/[a-z]/i)
Parsimmon.letters
is equivalent toParsimmon.regex(/[a-z]*/i)
Parsimmon.digit
is equivalent toParsimmon.regex(/[0-9]/)
Parsimmon.digits
is equivalent toParsimmon.regex(/[0-9]*/)
Parsimmon.whitespace
is equivalent toParsimmon.regex(/\s+/)
Parsimmon.optWhitespace
is equivalent toParsimmon.regex(/\s*/)
Parsimmon.any
consumes and yields the next character of the stream.Parsimmon.all
consumes and yields the entire remainder of the stream.Parsimmon.eof
expects the end of the stream.Parsimmon.index
is a parser that yields the current index of the parse.Parsimmon.test(pred)
yield a single character if it passes the predicate.Parsimmon.takeWhile(pred)
yield a string containing all the next characters that pass the predicate.
You can add a primitive parser (similar to the included ones) by using
Parsimmon.custom
. This is an example of how to create a parser that matches
any character except the one provided:
function notChar(char) {
return Parsimmon.custom(function(success, failure) {
return function(stream, i) {
if (stream.charAt(i) !== char && stream.length <= i) {
return success(i+1, stream.charAt(i));
}
return failure(i, 'anything different than "' + char + '"');
}
});
}
This parser can then be used and composed the same way all the existing ones are used and composed, for example:
var parser = seq(string('a'), notChar('b').times(5));
parser.parse('accccc');
parser.or(otherParser)
: returns a new parser which triesparser
, and if it fails usesotherParser
.parser.chain(function(result) { return anotherParser; })
: returns a new parser which triesparser
, and on success calls the given function with the result of the parse, which is expected to return another parser, which will be tried next. This allows you to dynamically decide how to continue the parse, which is impossible with the other combinators.parser.then(anotherParser)
: expectsanotherParser
to followparser
, and yields the result ofanotherParser
. NB: the result ofparser
here is ignored.parser.map(function(result) { return anotherResult; })
: transforms the output ofparser
with the given function.parser.skip(otherParser)
expectsotherParser
afterparser
, but preserves the yield value ofparser
.parser.result(aResult)
: returns a new parser with the same behavior, but which yieldsaResult
.parser.many()
: expectsparser
zero or more times, and yields an array of the results.parser.times(n)
: expectsparser
exactlyn
times, and yields an array of the results.parser.times(min, max)
: expectsparser
betweenmin
andmax
times, and yields an array of the results.parser.atMost(n)
: expectsparser
at mostn
times. Yields an array of the results.parser.atLeast(n)
: expectsparser
at leastn
times. Yields an array of the results.parser.mark()
yields an object withstart
,value
, andend
keys, wherevalue
is the original value yielded by the parser, andstart
andend
are the indices in the stream that contain the parsed text.parser.desc(description)
returns a new parser whose failure message is the passed description. For example,string('x').desc('the letter x')
will indicate that 'the letter x' was expected.
These apply to most parsers for traditional languages - it's possible you may need to do something different for yours!
For most parsers, the following format is helpful:
-
define a
lexeme
function to skip all the stuff you don't care about (whitespace, comments, etc). You may need multiple types of lexemes. For example,var ignore = whitespace.or(comment.many()); function lexeme(p) { return p.skip(ignore); }
-
Define all your lexemes first. These should yield native javascript values.
var lparen = lexeme(string('(')); var rparen = lexeme(string(')')); var number = lexeme(regex(/[0-9]+/)).map(parseInt);
-
Forward-declare one or more top-level expressions with
lazy
, referring to parsers that have not yet been defined. Generally, this takes the form of a large.alt()
callvar expr = lazy('an expression', function() { return Parsimmon.alt(p1, p2, ...); });
-
Then build your parsers from the inside out - these should return AST nodes or other objects specific to your domain.
var p1 = ... var p2 = ...
-
Finally, export your top-level parser. Remember to skip ignored stuff at the beginning.
return ignore.then(expr.many());
Parsimmon is also compatible with fantasyland. It is a Semigroup, an Applicative Functor and a Monad.