Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tree.children annotated with str instead of Token #966

Closed
plannigan opened this issue Aug 19, 2021 · 7 comments
Closed

Tree.children annotated with str instead of Token #966

plannigan opened this issue Aug 19, 2021 · 7 comments

Comments

@plannigan
Copy link
Contributor

Describe the bug

The type annotation for a Tree's children is Union[str, Tree]. However, the actual type is Union[Token, Tree]. While it is true that Token is a str, it causes mypy errors when trying to use Token properties.

One possible explanation is that the library wants to obscure away the fact that the real type is Token and only consider it a str. However, If I attempt to transform a parsed tree with the intent to write out a slightly altered file, Reconstructor fails when it gets a str instead of a Token.

To Reproduce

Example of mypy displaying an error for something that is valid.

from lark import Tree

tree = Tree("my_node", children=[])
for child in tree.children:
    if isinstance(child, Tree):
        print(f"is tree: {child.data}")
    else:
        print(f"is token: {child.type}")
$ mypy typing_example.py 
typing_example.py:8: error: "str" has no attribute "type"
Found 1 error in 1 file (checked 1 source file)

Using lark 0.11.3 & mypy 0.910.

Example of Reconstructor failure when replacing a token with a str. (Only mypy error has to do with a different annotation error for reconstruct() that has already been fixed.)

from lark import Lark
from lark.reconstruct import Reconstructor

parser = Lark("""
start: WORD WORD

%import common.WORD
%import common.WS_INLINE
%ignore WS_INLINE
""")

tree = parser.parse("Hello World")
tree.children[0] = "Good-bye"

print(Reconstructor(parser).reconstruct(tree))
$ python typing_example.py 
Traceback (most recent call last):
  File "/app/typing_example.py", line 15, in <module>
    print(Reconstructor(parser).reconstruct(tree, postproc=None))
  File "/pyenv/versions/my_venv/lib/python3.9/site-packages/lark/reconstruct.py", line 96, in reconstruct
    for item in x:
  File "/pyenv/versions/my_venv/lib/python3.9/site-packages/lark/reconstruct.py", line 79, in _reconstruct
    unreduced_tree = self.match_tree(tree, tree.data)
  File "/pyenv/versions/my_venv/lib/python3.9/site-packages/lark/tree_matcher.py", line 184, in match_tree
    unreduced_tree = parser.parse(ChildrenLexer(tree.children), rulename)
  File "/pyenv/versions/my_venv/lib/python3.9/site-packages/lark/parsers/earley.py", line 298, in parse
    to_scan = self._parse(lexer, columns, to_scan, start_symbol)
  File "/pyenv/versions/my_venv/lib/python3.9/site-packages/lark/parsers/earley.py", line 269, in _parse
    to_scan = scan(i, token, to_scan)
  File "/pyenv/versions/my_venv/lib/python3.9/site-packages/lark/parsers/earley.py", line 233, in scan
    if match(item.expect, token):
  File "/pyenv/versions/my_venv/lib/python3.9/site-packages/lark/tree_matcher.py", line 54, in _match
    assert False, (term, token)
AssertionError: (Terminal('WORD'), 'Good-bye')
@MegaIng
Copy link
Member

MegaIng commented Aug 19, 2021

It should be Union[str, Token, Tree]. This probably just something we missed. You are open to submit a PR. (Also, yes, the reconstructor needs Tokens to know which type a Terminal is)

@plannigan
Copy link
Contributor Author

Sure, I can write that up.

Also, yes, the reconstructor needs Tokens to know which type a Terminal is
It would be nice to be able to enforce that with type annotations, but that becomes a bigger issue of having parallel Tree types with a different set of allowed children.

@erezsh
Copy link
Member

erezsh commented Aug 19, 2021

The basic idea is that Tree is a generic structure that can contain any kind of object. So it's actually wrong in the sense that it should be children: List[Any]

BUT, Trees produced by Lark do follow a Union[Tree, Token] limitation.

We can express that in the return value of parse(), but I don't know if mypy's type resolution is clever enough to use that.

@plannigan
Copy link
Contributor Author

I played around with things and I think I have a solution that works by making Tree generic. I have to do some more testing to see how well it works in practice.

A type alias could then makes it easy to work with trees that have Union[Tree, Token] as children.

One note, with the current situation of the annotations being in stub files, it is a little awkward. User code that wants to provide a type needs to quote the value because the actual class doesn't support square brackets. However, I say the notes on dropping Python 2 support and a draft PR that moves the annotations into the code base. That would make this current issue go away.

@MegaIng
Copy link
Member

MegaIng commented Aug 19, 2021

Make a PR based on top of those changes and mark it as draft. If even we change those PRs even more, the core ideas will stay the same (e.g. no stub files).

@erezsh
Copy link
Member

erezsh commented Mar 16, 2022

Can we say this issue is now solved?

@plannigan
Copy link
Contributor Author

Yup

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants