Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add small validator utility for PEG grammars #23519

Merged
merged 1 commit into from
Dec 26, 2020

Conversation

pablogsal
Copy link
Member

@pablogsal pablogsal commented Nov 26, 2020

One common gotcha of PEG grammars is having two alternatives in a rule in which one that appears first is contained in one that appears last. In this scenario, the second one will never match as if there is input text that matches it, it will also match the first and that comes first.

To avoid this problem, this PR create an initial version of a small utility that detects these cases and raises to alert the user.

We don't need to detect all cases, only the ones that is easy enough to implement. The current form is just an initial prototype that does the matching based in the string representation. To make this better we need to improve the matching algorithm into a visitor to allow checking rules that contain options. For example:

some_rule:
    | NAME '+' NAME 'blech'?
    | NAME '+' NAME ';'

One "straightforward enough" way to do this is to generate all possible string representations: with and without optional and doing substring matching, but at this point a visitor that visits both alternatives at the same time and advances accordingly is probably better.

We could also support some rule expansion so we also detect something like this:

some_rule:
    | NAME '+' NAME
    | NAME some_other_rule
some_other_rule:
   '+' NAME ';'

@lysnikolaou
Copy link
Member

This is really great, @pablogsal! Thanks for putting in the time, again! 🚀

There are indeed many things we could do here and many different approaches we could follow. I'd certainly be okay with merging this as is and maybe start building on top of it to support things like rule expansion.

@lysnikolaou
Copy link
Member

Another thought is to link to some explanation of why the alternative will never be reached in the error message, since most people that will fall into these kind of traps won't be too familiar with the PEG formalism.

@pablogsal
Copy link
Member Author

Another thought is to link to some explanation of why the alternative will never be reached in the error message, since most people that will fall into these kind of traps won't be too familiar with the PEG formalism.

I am currently working on a document for the devguide on how to work with PEG grammars 🤫

@pablogsal pablogsal marked this pull request as ready for review December 26, 2020 01:30
@pablogsal
Copy link
Member Author

@lysnikolaou I plan to go ahead with this so we can start building up afterwards if that is ok with you.

@lysnikolaou
Copy link
Member

Of course! Go for it.

@pablogsal pablogsal merged commit 3bcc4ea into python:master Dec 26, 2020
adorilson pushed a commit to adorilson/cpython that referenced this pull request Mar 13, 2021
@pablogsal pablogsal deleted the validator branch May 19, 2021 18:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants