You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
By default the LALR grammar can have only a single lookahead, but it'd be really nice if it could have a custom lookahead on specific cases (I got used to JavaCC which implements this with something as a LOOKAHEAD(2) in the proper place to avoid the restriction).
The use case I have is below. From what I see, apparently the ?identifier: NAME (WS NAME|WS NAME_CONT)* sees the WS and takes that route but can't see that the whole construct is actually optional and should not keep matching (in JavaCC I'd put a LOOKAHEAD(2) there and it'd try to make the whole match and if it matched just the first rule but not the 2nd it'd be Ok).
p.s.: although earley works for this particular construct it doesn't work for the full grammar I'm working at, so, using it isn't really a solution...
Error
My name param 1 passed
^
Expected one of:
* NAME_CONT
* _NEWLINE
* NAME
Previous tokens: Token('WS', ' ')
Sample code
from lark.indenter import Indenter
from lark import Lark
class PythonIndenter(Indenter):
NL_type = "_NEWLINE"
OPEN_PAREN_types = ["LPAR", "LSQB", "LBRACE"]
CLOSE_PAREN_types = ["RPAR", "RSQB", "RBRACE"]
INDENT_type = "_INDENT"
DEDENT_type = "_DEDENT"
tab_len = 8
lark_spec = Lark(
r"""
file_input: (_NEWLINE | root_stmt)*
?root_stmt: func_block
func_block: BLOCK WS* "Function" WS* BLOCK WS* _NEWLINE (func_stmt)*
// i.e.: at least 2 spaces so that we have "Function name arguments"
func_stmt: identifier WS WS+ parameters? func_suite
parameters: param ("," WS* param)* ("," WS*)?
param: param_name ["=" WS* param_default]
param_name: identifier
param_default: identifier
func_suite: _NEWLINE (_INDENT stmt+ _DEDENT)?
?identifier: NAME (WS NAME|WS NAME_CONT)*
?stmt: identifier _NEWLINE
NAME: /(?!(OR|AND|IN)\b)\b[^\d\W]\w*/
NAME_CONT: /(?!(OR|AND|IN)\b)\b\w+/
BLOCK: /\*\*\* */
WS: /[ ]/
_NEWLINE: ( /\r?\n[ ]*/ | COMMENT )+
COMMENT: /#[^\n]*/
%declare _INDENT _DEDENT
""",
parser="lalr",
lexer="contextual",
postlex=PythonIndenter(),
start="file_input",
keep_all_tokens=True,
propagate_positions=True,
debug=True,
)
if __name__ == "__main__":
lark_spec.parse(
"""
*** Function ***
My name param 1 passed
Pass
""",
)
The text was updated successfully, but these errors were encountered:
Why are you explicitly putting down WS? Since that is ignored anyway, it has no purpose here.
Humm... probably I don't understand it enough then. Why is it ignored? Is there a way to not ignore it? In this particular grammar I'd like to have 2 spaces as a separator. Is this not possible?
i.e.: The code below would be valid code (as the identifier can have spaces):
oh lol, I thought you had an %ingnore statement in there since you were using the PythonIndenter. That one might break if you aren't ignoreing Inline WS:
By default the LALR grammar can have only a single lookahead, but it'd be really nice if it could have a custom lookahead on specific cases (I got used to JavaCC which implements this with something as a
LOOKAHEAD(2)
in the proper place to avoid the restriction).The use case I have is below. From what I see, apparently the
?identifier: NAME (WS NAME|WS NAME_CONT)*
sees theWS
and takes that route but can't see that the whole construct is actually optional and should not keep matching (in JavaCC I'd put a LOOKAHEAD(2) there and it'd try to make the whole match and if it matched just the first rule but not the 2nd it'd be Ok).p.s.: although earley works for this particular construct it doesn't work for the full grammar I'm working at, so, using it isn't really a solution...
Error
Sample code
The text was updated successfully, but these errors were encountered: