-
-
Notifications
You must be signed in to change notification settings - Fork 414
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generic tree typing #967
Generic tree typing #967
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice :-)
@@ -109,17 +112,18 @@ def iter_subtrees(self) -> 'Iterator[Tree]': | |||
subtrees = OrderedDict() | |||
for subtree in queue: | |||
subtrees[id(subtree)] = subtree | |||
queue += [c for c in reversed(subtree.children) | |||
# Reason for type ignore https://github.com/python/mypy/issues/10999 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the current type system it isn't possible to convey that "_Leaf_T
can be anything except a Tree
". So this will likely always be a thing mypy
won't be able to figure out on its own. The referenced issue has more context and might provide some insights on how to better handle it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the creative PR!
I have a few questions, since I'm not well versed in Python's type annotations.
I'm also, in general, a little concerned about the added complication. I can't figure out if it's correct just from a cursory glance. But perhaps that would improve as I get used to it.
@@ -591,7 +592,7 @@ def parse_interactive(self, text: Optional[str]=None, start: Optional[str]=None) | |||
""" | |||
return self.parser.parse_interactive(text, start=start) | |||
|
|||
def parse(self, text: str, start: Optional[str]=None, on_error: 'Optional[Callable[[UnexpectedInput], bool]]'=None) -> Tree: | |||
def parse(self, text: str, start: Optional[str]=None, on_error: 'Optional[Callable[[UnexpectedInput], bool]]'=None) -> 'ParseTree': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you change this to 'Tree[Token]'
, there is no need for a ParseTree
variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you talking about it not being needed in this instance or not being needed in general?
- For this specific use case, yes. That would not require the added import of
ParseTree
when type checking. - In general,
ParseTree
isn't strictly required at all in general. However, I think it is a nice ease of use to a have a specific name for trees returned byparse()
since it is a core type that users of the library will be interacting with a lot.
With this mypy will raise an error is the code attemps to use __mul__ when the left hand side does not produce a Tree
|
||
def __init__(self, *transformers: Transformer[_T]) -> None: | ||
def __init__(self, *transformers: 'Union[Transformer, TransformerChain]') -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the initialize taking a N transformers, it is not possible to use type annotations to ensure that the T -> U -> V
sequence is correct. This could be addressed by changing the class to only ever be aware of 2 Transformer
s. That is the only way it is called in the library, but users could be building chains directly.
Since there is already a lot going on in this PR. I'll hold off from also making any changes to the number of arguments accepted by this function.
Sorry it took me so long to review this. Overall it looks fine. A few notes:
Branch = Union[_Leaf_T, 'Tree[_Leaf_T]'] and use that throughout the code? (preliminary tests show that it should work). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additional notes
lark/visitors.py
Outdated
def __mul__(self, other: 'Transformer[_T]') -> 'TransformerChain[_T]': | ||
def __mul__( | ||
self: 'Transformer[Tree[_Leaf_U], _Leaf_T]', | ||
other: 'Union[Transformer[_Return_V, _Leaf_U], TransformerChain[_Return_V, _Leaf_U]]' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like everywhere that TransformerChain is used as a type, so is Transformer. So maybe it makes sense to just use AbsTransformer
, or ITransformer
instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is true. Transformer
and TransformerChain
have the same public interface. Protocol could be used to define that without requiring a base class. (It would just require typing-extensions for less than Python 3.8, but that's not a big issue.) I can work on that next.
while q: | ||
t = q.pop() | ||
rev_postfix.append(t) | ||
if isinstance(t, Tree): | ||
q += t.children | ||
|
||
# Postfix to tree | ||
stack = [] | ||
stack: List = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this addition?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, so this is a fun one:
- There isn't enough type information based on the
stack.append()
calls formypy
to understand what should be allowed to be in the list. So it asks for the initialization of the variable to be annotated. - However, the list can contain anything based on the implementations of the user functions (See comment). So that means that the type is
List[Any]
. - But the generic
Any
is redundant and can be removed.
So we are left we are left with an annotation that seems to be redundant to humans, but is there to explicitly tell mypy
how the list is intended to be used.
Branch must be used as a generic so that mypy understands that _Leaf_T is the same type associated with the Tree
|
||
def __init__(self, data: str, children: 'List[Union[_Leaf_T, Tree[_Leaf_T]]]', meta: Optional[Meta]=None) -> None: | ||
def __init__(self, data: str, children: 'List[Branch[_Leaf_T]]', meta: Optional[Meta]=None) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry if it's obvious, but why do we need to write Branch[_Leaf_T]
and not just Branch
, since it's already defined using _Leaf_T
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a problem, it is a subtle point. At the top level _Leaf_T
stand for some type, but it can still be anything. Once a variable containing a tree is defined as Tree[Foo]
, then that tree's _Leaf_T
is bound to Foo
. Since Branch
is defined at the top level, it can still be any type. Specifically using Branch[_Leaf_T]
within the context of a Tree
, means that the type has to match the bound _Leaf_T
(in our example Foo
).
lark/visitors.py
Outdated
from typing_extensions import Protocol | ||
|
||
_Return_T = TypeVar('_Return_T', covariant=True) | ||
_Return_V = TypeVar('_Return_V', covariant=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quick question: Why is it Leaf_U but Return_V? Do the U and V stand for something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The signature for the multiplication operator has a lot of moving pieces. It starts with a ITransformer[_Leaf_T, Tree[_Leaf_U]]
instance (a transformer that converts a Tree[_Leaf_T
] to Tree[_Leaf_U]
). Then a ITransformer[_Leaf_U, _Return_V]
instance (a transformer that converts a Tree[_Leaf_U
] to _Return_V
). Which results in a ITransformer[_Leaf_T, _Return_V]
(a transformer that converts a Tree[_Leaf_T
] to _Return_V
).
There isn't a technical reason why it needs to be named with _Return_V
instead of _Return_U
. However, I was thinking that using _Return_V
would make it clearer that the final type isn't necessarily related to the intermediary _Leaf_U
. I'm open to switching to _Leaf_U
or putting some of my above description into a comment in the source code if you think the current state should be changed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In #926 we chose not to add typing_extensions
as a dependency. A workaround would be to put the import of Protocol
and the definition of ITransformer
within an if TYPE_CHECKING
.
Ok, I didn't have the context of that previous PR when I implemented this change. The one catch that makes this different is that from typing import TYPE_CHECKING
if TYPE_CHECKING:
from lark.visitors import ITransformer
class Foo:
pass
class Bar:
pass
my_tansformer: 'ITransformer[Foo, Bar]' That being said, This leaves us with two options:
I'm open to either option, but have a slight preference to using |
Is there a way to get the same effect without Anyway, I agree, it doesn't seem worth it to add a dependency just to support ITransformer for 3.6 and 3.7 |
|
Reminder from an earlier conversation:
Since the PR didn't make it into the 1.0 release, that is something that should be kept in mind. If the choice is made to go the 2.0 route, I remember seeing a couple of places where minor API changes could resolve a few of the mypy errors that are currently reported on the library. |
You mean Lark 2.0? That's not going to happen for a while.
Like what? |
I understand the desire to not have a 2.0 so shortly after 1.0 was released. But it is unfortunate that this PR did not get merged in time when breaking changes were less of a concern. I think I should be able to resolve most, maybe all, of the existing smaller mypy issue with smaller API-compatible PRs., But two minor API changes I'd recommend: 1 - Have
2 - Replace
The locations in the code base that use this method use it correctly by first doing a list comprehension with an if clause that ensures the index points to a
A larger API change I'd recommend to be included in a theoretical future 2.0 would be to remove the use of custom double underscore names (ex.
|
What would you name it instead? (the goal is to avoid name collision with any user-defined rule) |
Since "rules whose name begins with an underscore will be inlined into their containing rule", a parse tree won't have any with a single leading underscore. So I actually ran into an issue with a rule name collision with a Python keyword. I'll open a new issue for that to keep this discussion on topic, but the solution I had in mind would also help resolve the issue with rule names colliding with lark "internal" names. |
The main issue with something like (and In contrast, magic_method syntax is the common way to facilitate interaction with the runtime, like eq, name, and iter. Also other libraries, like rich, use this for similar purposes: https://rich.readthedocs.io/en/stable/protocol.html |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems good enough for now.
Any improvements to it can be added in separate PRs.
@MegaIng @chanicpanic Anything to add before I merge it? |
Thanks @plannigan ! And sorry for being such a stickler :) Oops: Forgot it was rebased on top of 1.0b . I'll get it into master. Edit2: Done. |
Tree
so that it is generic based on the type of the child leaf.ParseTree
alias so that it is easy to refer to tree produced by the lark parser.The changes don't introduce any new
mypy
errors, but there is one case where more type information demonstrated an existing inconsistency.Addresses #966