Skip to content

Add source names (via new Stream and SourceSpan classes) and .span() combinator #83

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 11 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ jobs:
strategy:
fail-fast: false
matrix:
python-version: [3.7, 3.8, 3.9, "3.10", "3.11", "pypy-3.7"]
python-version: ["3.8", "3.9", "3.10", "3.11", "3.12", "3.13", "pypy-3.7"]

env:
PYTHON: ${{ matrix.python-version }}
Expand Down
49 changes: 41 additions & 8 deletions docs/ref/methods_and_combinators.rst
Original file line number Diff line number Diff line change
Expand Up @@ -23,25 +23,25 @@ can be used and manipulated as below.
The following methods are for actually **using** the parsers that you have
created:

.. method:: parse(string_or_list)
.. method:: parse(stream)

Attempts to parse the given string (or list). If the parse is successful
and consumes the entire string, the result is returned - otherwise, a
Attempts to parse the given :class:`Stream` of data. If the parse is successful
and consumes the entire stream, the result is returned - otherwise, a
``ParseError`` is raised.

Instead of passing a string, you can in fact pass a list of tokens. Almost
all the examples assume strings for simplicity. Some of the primitives are
Most commonly, a stream simply wraps a string, but you could use a list of tokens instead.
Almost all the examples assume strings for simplicity. Some of the primitives are
also clearly string specific, and a few of the combinators (such as
:meth:`Parser.concat`) are string specific, but most of the rest of the
library will work with tokens just as well. See :doc:`/howto/lexing` for
more information.

.. method:: parse_partial(string_or_list)
.. method:: parse_partial(stream)

Similar to ``parse``, except that it does not require the entire
string (or list) to be consumed. Returns a tuple of
stream to be consumed. Returns a tuple of
``(result, remainder)``, where ``remainder`` is the part of
the string (or list) that was left over.
the stream that was left over.

The following methods are essentially **combinators** that produce new
parsers from the existing one. They are provided as methods on ``Parser`` for
Expand Down Expand Up @@ -401,6 +401,20 @@ can be used and manipulated as below.
</howto/lexing/>` and want subsequent parsing of the token stream to be
able to report original positions in error messages etc.

.. method:: span()

Returns a parser that augments the initial parser's result with a :class:`SourceSpan`
containing information about where that parser started and stopped within the
source data. The new value is a tuple:

.. code:: python

(source_span, original_value)

This enables reporting of custom errors involving source locations, such as when
using parsy as a :doc:`lexer</howto/lexing/>` or when building a syntax tree that will be
further analyzed.

.. _operators:

Parser operators
Expand Down Expand Up @@ -594,3 +608,22 @@ Parsy does not try to include every possible combinator - there is no reason why
you cannot create your own for your needs using the built-in combinators and
primitives. If you find something that is very generic and would be very useful
to have as a built-in, please :doc:`submit </contributing>` as a PR!

Auxiliary data structures
=========================

.. class:: Stream

Wraps a string, byte sequence, or list, possibly equipping it with a source.
If the data is loaded from a file or URL, the source should be that file path or URL.
The source name is used in generated parse error messages.

.. method:: __init__(data, [source=None])

Wraps the data into a stream, possibly equipping it with a source.

.. class:: SourceSpan

Identifies a span of material from the data being parsed by its start row and column and its end
row and column. If the data stream was equipped with a source, that value is also available in
this object.
6 changes: 4 additions & 2 deletions examples/json.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
from parsy import forward_declaration, regex, seq, string
from parsy import Stream, forward_declaration, regex, seq, string

# Utilities
whitespace = regex(r"\s*")
Expand Down Expand Up @@ -45,7 +45,8 @@
def test():
assert (
json_doc.parse(
r"""
Stream(
r"""
{
"int": 1,
"string": "hello",
Expand All @@ -55,6 +56,7 @@ def test():
"other": [true, false, null]
}
"""
)
)
== {
"int": 1,
Expand Down
6 changes: 3 additions & 3 deletions examples/sql_select.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
from dataclasses import dataclass
from typing import List, Optional, Union

from parsy import from_enum, regex, seq, string
from parsy import Stream, from_enum, regex, seq, string

# -- AST nodes:

Expand Down Expand Up @@ -109,7 +109,7 @@ class Select:


def test_select():
assert select.parse("SELECT thing, stuff, 123, 'hello' FROM my_table WHERE id = 1;") == Select(
assert select.parse(Stream("SELECT thing, stuff, 123, 'hello' FROM my_table WHERE id = 1;")) == Select(
columns=[
Field("thing"),
Field("stuff"),
Expand All @@ -126,7 +126,7 @@ def test_select():


def test_optional_where():
assert select.parse("SELECT 1 FROM x;") == Select(
assert select.parse(Stream("SELECT 1 FROM x;")) == Select(
columns=[Number(1)],
table=Table("x"),
where=None,
Expand Down
5 changes: 5 additions & 0 deletions setup.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
import sys

if sys.version_info[1] > 11:
sys.exit(0)

from setuptools import setup

setup()
100 changes: 80 additions & 20 deletions src/parsy/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,13 +12,48 @@
noop = lambda x: x


def line_info_at(stream, index):
@dataclass
class Stream:
"""Data to parse, possibly equipped with a name for the source it's from,
e.g. a file path."""

data: str | bytes | list
source: str | None = None

def __len__(self):
return len(self.data)

def __getitem__(self, i):
# Subscripting bytes with `[index]` instead of `[index:index + 1]`
# returns an int
if isinstance(self.data, bytes) and not isinstance(i, slice):
return self.data[i : i + 1]
else:
return self.data[i]


@dataclass
class SourceSpan:
"""Identifies a span of material from the data to parse.

Attributes:
source (str | None): the source of the data, e.g. a file path.
start ([int, int]): the start row and column of the span.
end ([int, int]): the end row and column of the span.
"""

source: str | None
start: [int, int]
end: [int, int]


def line_info_at(stream: Stream, index):
if index > len(stream):
raise ValueError("invalid index")
line = stream.count("\n", 0, index)
last_nl = stream.rfind("\n", 0, index)
line = stream.data.count("\n", 0, index)
last_nl = stream.data.rfind("\n", 0, index)
col = index - (last_nl + 1)
return (line, col)
return (stream.source, line, col)


class ParseError(RuntimeError):
Expand All @@ -29,7 +64,11 @@ def __init__(self, expected, stream, index):

def line_info(self):
try:
return "{}:{}".format(*line_info_at(self.stream, self.index))
source, row, col = line_info_at(self.stream, self.index)
if source is None:
return "{}:{}".format(row, col)
else:
return "{}:{}:{}".format(source, row, col)
except (TypeError, AttributeError): # not a str
return str(self.index)

Expand Down Expand Up @@ -83,30 +122,35 @@ class Parser:
of the failure.
"""

def __init__(self, wrapped_fn: Callable[[str | bytes | list, int], Result]):
def __init__(self, wrapped_fn: Callable[[Stream, int], Result]):
"""
Creates a new Parser from a function that takes a stream
and returns a Result.
"""
self.wrapped_fn = wrapped_fn

def __call__(self, stream: str | bytes | list, index: int):
def __call__(self, stream: Stream, index: int):
return self.wrapped_fn(stream, index)

def parse(self, stream: str | bytes | list) -> Any:
def parse(self, stream: Stream | str | bytes | list) -> Any:
"""Parses a string or list of tokens and returns the result or raise a ParseError."""
(result, _) = (self << eof).parse_partial(stream)
return result

def parse_partial(self, stream: str | bytes | list) -> tuple[Any, str | bytes | list]:
def parse_partial(self, stream: Stream | str | bytes | list) -> tuple[Any, Stream]:
"""
Parses the longest possible prefix of a given string.
Returns a tuple of the result and the unparsed remainder,
or raises ParseError
"""
result = self(stream, 0)
result = self(
stream if isinstance(stream, Stream) else Stream(stream),
0,
)

if result.status:
# The type of the returned remaining stream matches the type of the
# input stream.
return (result.value, stream[result.index :])
else:
raise ParseError(result.expected, stream, result.furthest)
Expand Down Expand Up @@ -339,14 +383,35 @@ def mark(self) -> Parser:
((start_row, start_column),
original_value,
(end_row, end_column))

``.span()'' is a more powerful version of this combinator, returning a
SourceSpan.
"""

@generate
def marked():
start = yield line_info
_, *start = yield line_info
body = yield self
end = yield line_info
return (start, body, end)
_, *end = yield line_info
return (tuple(start), body, tuple(end))

return marked

def span(self) -> Parser:
"""
Returns a parser that augments the initial parser's result with a
SourceSpan capturing where that parser started and stopped.
The new value is a tuple:

(source_span, original_value)
"""

@generate
def marked():
source, *start = yield line_info
body = yield self
_, *end = yield line_info
return (SourceSpan(source, tuple(start), tuple(end)), body)

return marked

Expand Down Expand Up @@ -557,7 +622,7 @@ def regex(exp: str, flags=0, group: int | str | tuple = 0) -> Parser:

@Parser
def regex_parser(stream, index):
match = exp.match(stream, index)
match = exp.match(stream.data, index)
if match:
return Result.success(match.end(), match.group(*group))
else:
Expand All @@ -577,12 +642,7 @@ def test_item(func: Callable[..., bool], description: str) -> Parser:
@Parser
def test_item_parser(stream, index):
if index < len(stream):
if isinstance(stream, bytes):
# Subscripting bytes with `[index]` instead of
# `[index:index + 1]` returns an int
item = stream[index : index + 1]
else:
item = stream[index]
item = stream[index]
if func(item):
return Result.success(index + 1, item)
return Result.failure(index, description)
Expand Down
Loading
Loading