Skip to content

Commit

Permalink
Add a permutation parser.
Browse files Browse the repository at this point in the history
Fixes #159.
  • Loading branch information
tzlaine committed Mar 10, 2024
1 parent 48d5cce commit 824a208
Show file tree
Hide file tree
Showing 9 changed files with 539 additions and 59 deletions.
4 changes: 4 additions & 0 deletions doc/tables.qbk
Original file line number Diff line number Diff line change
Expand Up @@ -369,6 +369,7 @@ consume the input they match unless otherwise stated in the table below.]
[[`p1 | p2`] [ Matches iff either `p1` matches or `p2` matches. ] [`std::variant<_ATTR_np_(p1), _ATTR_np_(p2)>` (See note.)] [ `|` is associative; `p1 | p2 | p3`, `(p1 | p2) | p3`, and `p1 | (p2 | p3)` are all equivalent. This attribute type only applies to the case where `p1` and `p2` both generate attributes, and where the attribute types are different; see _attr_gen_ for the full rules. ]]
[[`p | c`] [ Equivalent to `p | lit(c)`. ] [`_ATTR_np_(p)`] []]
[[`p | r`] [ Equivalent to `p | lit(r)`. ] [`_ATTR_np_(p)`] []]
[[`p1 || p2`] [ Matches iff `p1` matches and `p2` matches, regardless of the order they match in. ] [`_bp_tup_<_ATTR_np_(p1), _ATTR_np_(p2)>`] [ `||` is associative; `p1 || p2 || p3`, `(p1 || p2) || p3`, and `p1 || (p2 || p3)` are all equivalent. It is an error to include a _e_ (conditional or non-conditional) in an `operator||` expression. Though the parsers are matched in any order, the attribute elements are always in the order written in the `operator||` expression. ]]
[[`p1 - p2`] [ Equivalent to `!p2 >> p1`. ] [`_ATTR_np_(p1)`] []]
[[`p - c`] [ Equivalent to `p - lit(c)`. ] [`_ATTR_np_(p)`] []]
[[`p - r`] [ Equivalent to `p - lit(r)`. ] [`_ATTR_np_(p)`] []]
Expand Down Expand Up @@ -508,6 +509,9 @@ tables below:
[[`p1 | p2`] [`std::variant<_ATTR_np_(p1), _ATTR_np_(p2)>`]]
[[`p1 | p2 | p3`] [`std::variant<_ATTR_np_(p1), _ATTR_np_(p2), _ATTR_np_(p3)>`]]

[[`p1 || p2`] [`_bp_tup_<_ATTR_np_(p1), _ATTR_np_(p2)>`]]
[[`p1 || p2 || p3`] [`_bp_tup_<_ATTR_np_(p1), _ATTR_np_(p2), _ATTR_np_(p3)>`]]

[[`p1 % p2`] [`std::string` if `_ATTR_np_(p)` is `char` or `char32_t`, otherwise `std::vector<_ATTR_np_(p1)>`]]

[[`p[a]`] [None.]]
Expand Down
110 changes: 57 additions & 53 deletions doc/tutorial.qbk
Original file line number Diff line number Diff line change
Expand Up @@ -49,20 +49,25 @@ Throughout the _Parser_ documentation, I will refer to "the call to _p_".
Read this as "the call to any one of the functions described in _p_api_".
That includes _pp_, _cbp_, and _cbpp_.

There are a couple of special kinds of parsers that come up often in this
There are some special kinds of parsers that come up often in this
documentation.

One is a /sequence parser/; you will see it created using `operator>>()`, as
One is a /sequence parser/; you will see it created using `operator>>`, as
in `p1 >> p2 >> p3`. A sequence parser tries to match all of its subparsers
to the input, one at a time, in order. It matches the input iff all its
subparsers do.

The other is an /alternative parser/; you will see it created using
`operator|()`, as in `p1 | p2 | p3`. A alternative parser tries to match all
Another is an /alternative parser/; you will see it created using
`operator|`, as in `p1 | p2 | p3`. An alternative parser tries to match all
of its subparsers to the input, one at a time, in order; it stops after
matching at most one subparser. It matches the input iff one of its
subparsers does.

Finally, there is a /permutation parser/; it is created using `operator||`,
as in `p1 || p2 || p3`. A permutation parser tries to match all of its
subparsers to the input, in any order. So the parser `p1 || p2 || p3` is equivalent to `(p1 >> p2 >> p3) | (p1 >> p3 >> p2) | (p2 >> p1 >> p3) | (p2 >> p3 >> p1) | (p3 >> p1 >> p2) | (p3 >> p2 >> p1)`. Hopefully its terseness is self-explanatory. It matches the
input iff all of its subparsers do, regardless of the order they match in.

_Parser_ parsers each have an /attribute/ associated with them, or explicitly
have no attribute. An attribute is a value that the parser generates when it
matches the input. For instance, the parser _d_ generates a `double` when it
Expand All @@ -86,7 +91,7 @@ and then we parse it:
The expression `*bp::char_` is a parser-expression. It uses one of the many
parsers that _Parser_ provides: _ch_. Like all _Parser_ parsers, it has
certain operations defined on it. In this case, `*bp::char_` is using an
overloaded `operator*()` as the C++ version of a _kl_ operator. Since C++ has
overloaded `operator*` as the C++ version of a _kl_ operator. Since C++ has
no postfix unary `*` operator, we have to use the one we have, so it is used
as a prefix.

Expand Down Expand Up @@ -126,16 +131,16 @@ just use that. If we wanted to parse two `double`s in a row, we'd use:

boost::parser::double_ >> boost::parser::double_

`operator>>()` in this expression is the sequence-operator; read it as
"followed by". If we combine the sequence-operator with _kl_, we can get the
parser we want by writing:
`operator>>` in this expression is the sequence-operator; read it as "followed
by". If we combine the sequence-operator with _kl_, we can get the parser we
want by writing:

boost::parser::double_ >> *(',' >> boost::parser::double_)

This is a parser that matches at least one `double` _emdash_ because of the
first _d_ in the expression above _emdash_ followed by zero or more instances
of a-comma-followed-by-a-`double`. Notice that we can use `','` directly.
Though it is not a parser, `operator>>()` and the other operators defined on
Though it is not a parser, `operator>>` and the other operators defined on
_Parser_ parsers have overloads that accept character/parser pairs of
arguments; these operator overloads will create the right parser to recognize
`','`.
Expand Down Expand Up @@ -947,7 +952,7 @@ Why do we need any of this, considering that we just used a literal `','` in
our previous example? The reason is that `'M'` is not used in an expression
with another _Parser_ parser. It is used within `*'M'_l[add_1000]`. If we'd
written `*'M'[add_1000]`, clearly that would be ill-formed; `char` has no
`operator*()`, nor an `operator[]()`, associated with it.
`operator*`, nor an `operator[]`, associated with it.

[tip Any time you want to use a `char`, `char32_t`, or string literal in a
_Parser_ parser, write it as-is if it is combined with a preexisting _Parser_
Expand All @@ -956,7 +961,7 @@ _lit_, or use the `_l` _udl_ suffix.]

On to the next bit: `-hundreds[add]`. By now, the use of the index operator
should be pretty familiar; it associates the semantic action `add` with the
parser `hundreds`. The `operator-()` at the beginning is new. It means that
parser `hundreds`. The `operator-` at the beginning is new. It means that
the parser it is applied to is optional. You can read it as "zero or one".
So, if `hundreds` is not successfully parsed after `*'M'[add_1000]`, nothing
happens, because `hundreds` is allowed to be missing _emdash_ it's optional.
Expand Down Expand Up @@ -1050,7 +1055,7 @@ for more information.]
[section Alternative Parsers]

Frequently, you need to parse something that might have one of several forms.
`operator|()` is overloaded to form alternative parsers. For example:
`operator|` is overloaded to form alternative parsers. For example:

namespace bp = boost::parser;
auto const parser_1 = bp::int_ | bp::eps;
Expand Down Expand Up @@ -1198,23 +1203,23 @@ since they are also used to create parsers, it is more useful just to focus on
that. The directives _rpt_ and _if_ were already described in the section on
parsers; we won't say much about them here.

[heading Interaction with sequence parsers]

Sequence and alternative parsers do not nest in most cases. (Let's consider
just sequence parsers to keep thinkgs simple, but all this logic applies to
alternative parsers as well.) `a >> b >> c` is the same as `(a >> b) >> c`
and `a >> (b >> c)`, and they are each represented by a single _seq_p_ with
three subparsers, `a`, `b`, and `c`. However, if something prevents two
_seq_ps_ from interacting directly, they *will* nest. For instance, `lexeme[a
>> b] >> c` is a _seq_p_ containing two parsers, `lexeme[a >> b]` and `c`.
This is because _lexeme_ takes its given parser and wraps it in a _lex_p_.
This in turn turns off the sequence parser combining logic, since both sides
of the second `operator>>` in `lexeme[a >> b] >> c` are not _seq_ps_.
Sequence parsers have several rules that govern what the overall attribute
type of the parser is, based on the positions and attributes of it subparsers
(see _attr_gen_). Therefore, it's important to know which directives create a
new parser (and what kind), and which ones do not; this is indicated for each
directive below.
[heading Interaction with sequence, alternative, and permutation parsers]

Sequence, alternative, and permutation parsers do not nest in most cases.
(Let's consider just sequence parsers to keep thinkgs simple, but most of this
logic applies to alternative parsers as well.) `a >> b >> c` is the same as
`(a >> b) >> c` and `a >> (b >> c)`, and they are each represented by a single
_seq_p_ with three subparsers, `a`, `b`, and `c`. However, if something
prevents two _seq_ps_ from interacting directly, they *will* nest. For
instance, `lexeme[a >> b] >> c` is a _seq_p_ containing two parsers, `lexeme[a
>> b]` and `c`. This is because _lexeme_ takes its given parser and wraps it
in a _lex_p_. This in turn turns off the sequence parser combining logic,
since both sides of the second `operator>>` in `lexeme[a >> b] >> c` are not
_seq_ps_. Sequence parsers have several rules that govern what the overall
attribute type of the parser is, based on the positions and attributes of it
subparsers (see _attr_gen_). Therefore, it's important to know which
directives create a new parser (and what kind), and which ones do not; this is
indicated for each directive below.

[heading The directives]

Expand Down Expand Up @@ -1261,10 +1266,9 @@ Creates a _raw_p_.

`_string_view_np_[p]` is very similar to `_raw_np_[p]`, except that it changes
the attribute of `p` to `std::basic_string_view<C>`, where `C` is the
character type of the underlying sequence being parsed. _string_view_
requires that the underlying range being parsed is contiguous. Since this can
only be detected in C++20 and later, _string_view_ is not available in C++17
mode.
character type of the underlying range being parsed. _string_view_ requires
that the underlying range being parsed is contiguous. Since this can only be
detected in C++20 and later, _string_view_ is not available in C++17 mode.

Similar to the re-use scenario for _omit_ above, _string_view_ could be used
to find the *locations* of all non-overlapping matches of `p` in a string.
Expand Down Expand Up @@ -1374,9 +1378,9 @@ _transform_ creates a _xfm_p_.
[section Combining Operations]

Certain overloaded operators are defined for all parsers in _Parser_. We've
already seen some of them used in this tutorial, especially `operator>>()` and
`operator|()`, which are used to form sequence parsers and alternative
parsers, respectively.
already seen some of them used in this tutorial, especially `operator>>`,
`operator|`, and `operator||`, which are used to form sequence parsers,
alternative parsers, and permutation parsers, respectively.

[table_combining_operations]

Expand Down Expand Up @@ -2702,9 +2706,9 @@ they accept are the same.
[heading _search_]

As shown in _p_api_, the two patterns of parsing in _Parser_ are whole-parse
and prefix-parse. When you want to find something in the middle of the
sequence being parsed, there's no `parse` API for that. You can of course
make a simple parser that skips everything before what you're looking for.
and prefix-parse. When you want to find something in the middle of the range
being parsed, there's no `parse` API for that. You can of course make a
simple parser that skips everything before what you're looking for.

namespace bp = boost::parser;
constexpr auto parser = /* ... */;
Expand All @@ -2717,13 +2721,13 @@ write. If you need to parse something from the middle in order to generate
attributes, this is what you should use.

However, it often turns out you only need to find some subrange in the parsed
sequence. In these cases, it would be nice to turn this into a proper
algorithm in the pattern of the ones in `std::ranges`, since that's more
idiomatic. _search_ is that algorithm. It has very similar semantics to
range. In these cases, it would be nice to turn this into a proper algorithm
in the pattern of the ones in `std::ranges`, since that's more idiomatic.
_search_ is that algorithm. It has very similar semantics to
`std::ranges::search`, except that it searches not for a match to an exact
subsequence, but to a match with the given parser. Like
`std::ranges::search()`, it returns a subrange (`boost::parser::subrange` in
C++17, `std::ranges::subrange` in C++20 and later).
subrange, but to a match with the given parser. Like `std::ranges::search()`,
it returns a subrange (`boost::parser::subrange` in C++17,
`std::ranges::subrange` in C++20 and later).

namespace bp = boost::parser;
auto result = bp::search("aaXYZq", bp::lit("XYZ"), bp::ws);
Expand Down Expand Up @@ -2880,17 +2884,17 @@ range for `replacement`. What happens in this case is silent transcoding of
`replacement` from UTF-M to UTF-N by the _replace_ range adaptor. This
doesn't require memory allocation; _replace_ just slaps `|
boost::parser::as_utfN` onto `replacement`. However, since _Parser_ treats
`char` sequences as unknown encoding, _replace_ will not transcode from `char`
sequences. So calls like this won't work:
`char` ranges as unknown encoding, _replace_ will not transcode from `char`
ranges. So calls like this won't work:

char const str[] = "some text";
char const replacement_str[] = "some text";
using namespace bp = boost::parser;
auto r = empty_str | bp::replace(parser, replacement_str | bp::as_utf8); // Error: ill-formed! Can't mix plain-char inputs and UTF replacements.

This does not work, even though `char` and UTF-8 are the same size. If `r`
and `replacement` are both sequences of `char`, everything will work of
course. It's just mixing `char` and UTF-encoded sequences that does not work.
and `replacement` are both ranges of `char`, everything will work of course.
It's just mixing `char` and UTF-encoded ranges that does not work.

All the details called out in the subsection on _search_ above apply to
_replace_: its parser produces no attributes; it accepts C-style strings for
Expand Down Expand Up @@ -3193,7 +3197,7 @@ See _ex_cb_json_ for an extended example of callback parsing.
[heading Error handling]

_Parser_ has good error reporting built into it. Consider what happens when
we fail to parse at an expectation point (created using `operator>()`). If I
we fail to parse at an expectation point (created using `operator>`). If I
feed the parser from the _ex_cb_json_ example a file called sample.json
containing this input (note the unmatched `'['`):

Expand Down Expand Up @@ -3623,16 +3627,16 @@ _Parser_ seldom allocates memory. The exceptions to this are:
* If trace is turned on by passing `_trace_::on` to a top-level
parsing function, the names of parsers are allocated.

* When a failed expectation is encountered (using `operator>()`), the name of
* When a failed expectation is encountered (using `operator>`), the name of
the failed parser is placed into a _std_str_, which will usually cause an
allocation.

* _str_'s attribute is a _std_str_, the use of which implies allocation. You
can avoid this allocation by explicitly using a different string type for
the attribute that does not allocate.

* The attribute for `_rpt_np_(p)` in all its forms, including `operator*()`,
`operator+()`, and `operator%()`, is `std::vector<_ATTR_np_(p)>`, the use of
* The attribute for `_rpt_np_(p)` in all its forms, including `operator*`,
`operator+`, and `operator%`, is `std::vector<_ATTR_np_(p)>`, the use of
which implies allocation. You can avoid this allocation by explicitly using
a different sequence container for the attribute that does not allocate.
`boost::container::static_vector` or C++26's `std::inplace_vector` may be
Expand Down
7 changes: 7 additions & 0 deletions include/boost/parser/detail/printing.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,13 @@ namespace boost { namespace parser { namespace detail {
std::ostream & os,
int components = 0);

template<typename Context, typename ParserTuple>
void print_parser(
Context const & context,
perm_parser<ParserTuple> const & parser,
std::ostream & os,
int components = 0);

template<
typename Context,
typename ParserTuple,
Expand Down
40 changes: 34 additions & 6 deletions include/boost/parser/detail/printing_impl.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,10 @@ namespace boost { namespace parser { namespace detail {
struct n_aray_parser<or_parser<ParserTuple>> : std::true_type
{};

template<typename ParserTuple>
struct n_aray_parser<perm_parser<ParserTuple>> : std::true_type
{};

template<
typename ParserTuple,
typename BacktrackingTuple,
Expand Down Expand Up @@ -165,30 +169,54 @@ namespace boost { namespace parser { namespace detail {
os << ")";
}

template<typename Context, typename ParserTuple>
void print_parser(
template<typename Context, typename Parser>
void print_or_like_parser(
Context const & context,
or_parser<ParserTuple> const & parser,
Parser const & parser,
std::ostream & os,
int components)
int components,
std::string_view or_ellipsis,
std::string_view ws_or)
{
int i = 0;
bool printed_ellipsis = false;
hl::for_each(parser.parsers_, [&](auto const & parser) {
if (components == parser_component_limit) {
if (!printed_ellipsis)
os << " | ...";
os << or_ellipsis;
printed_ellipsis = true;
return;
}
if (i)
os << " | ";
os << ws_or;
detail::print_parser(context, parser, os, components);
++components;
++i;
});
}

template<typename Context, typename ParserTuple>
void print_parser(
Context const & context,
or_parser<ParserTuple> const & parser,
std::ostream & os,
int components)
{
detail::print_or_like_parser(
context, parser, os, components, " | ...", " | ");
}

template<typename Context, typename ParserTuple>
void print_parser(
Context const & context,
perm_parser<ParserTuple> const & parser,
std::ostream & os,
int components)
{
detail::print_or_like_parser(
context, parser, os, components, " || ...", " || ");
}

template<
typename Context,
typename ParserTuple,
Expand Down
Loading

0 comments on commit 824a208

Please sign in to comment.