-
-
Notifications
You must be signed in to change notification settings - Fork 417
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ABNF grammar support #1017
ABNF grammar support #1017
Conversation
This fix is needed for ABNF grammar support.
Syntax: %import module %import module (rule1, rule2, ...) Example: %import core-rules ; import rules from lark/grammars/core-rules.abnf %import core-rules (CRLF, DIGITS) ; import specified rules (CRLF and DIGITS) only
I would personally prefer that if we add this stuff, we do it in a way that allows easy adding of new syntax variations in the future via registering a dialect with the grammar loader. I think we don't need to copy over the I have started an implementation of that, but I think I stopped because I had problems with detecting what is and what is not a terminal name (similar to what you are describing). |
I agree, it's much better if non-lark grammars were written as a plugin that's implemented with a .lark file + transformer (e.g. Of course, we can refactor / modify whatever is necessary in load_grammar in order to make that possible. @MegaIng What do you mean by "problems with detecting what is and what is not a terminal name" ? @t-higuchi Good start! Don't worry, we'll help you get this PR in shape. |
The rule uppercase is terminal might not be that applicable to all dialects, and some dialects might want to use other rules. That would mean that the original Transformers needs to tell the GrammarLoaders what is and what is no a Terminal. However, call to |
@MegaIng We already replace terminal/rule names with Terminal/NonTerminal objects. We should just do it earlier, so that all isupper() happen inside |
@erezsh Should. But I had problem because different places complained about getting Symbol objects instead of strings. |
@MegaIng I don't mind doing that refactor myself. I'll try to get to it this week. |
Okay, I will give a try in ABNF-to-EBNF converter approach. |
This PR implements feature request #318
As proposed in #318, 'syntax' option need to be set to load ABNF grammar:
parser = Lark("... grammar ... ", syntax='abnf')
It accepts ABNF grammar described in RFC5234 and RFC7405.
Non-standard extension to ABNF is kept minimal. Only '%import' directive is implemented.
Please see an example (url_parser_abnf.py ) for ABNF-specific decorator usage and post-processing to parse tree.
Some implementation notes:
terminals
andcompiled_rules
just as Grammar.compile() does for lark's EBNF grammar. This approach makes minimal modifications to existing code base.