-
Notifications
You must be signed in to change notification settings - Fork 396
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rewrote parser in tril #4345
Rewrote parser in tril #4345
Conversation
c9b792d
to
673f85c
Compare
We're going to need some Doxygen documentation here. There is not a single comment in the entire file that was added. We'll also need a more descriptive commit message for this rather large change. Taking a quick glance here it seems the lexer and parser are implemented using a bottom-up approach with an LR(x) parser. What is the value of x? Looking at the code I'm a little surprised we went with an LR parser to parse Tril, given that the grammar for the language is very Lisp like so it is trivially parsable with a recursive descent LL(2) parser which is easy to read (from a code perspective) and to extend to support more features in Tril. Will the current implementation be easy to extend to parse things like node flags, etc.? Having said that the lexer and parser here is quite concise, so kudos on that! Good work! |
@genie-omr build all |
Sorry, I accidentally close this PR while writing this comment.
Completely agree. In particular, we should document the Tril grammar so it's easy to see what the parser is looking for.
Actually, the parser is already in recursive decent form. Parsing is top-down, but the AST is built bottom-up. The distinction is that the decision of which production rule to apply is done in a top-down fashion. That is, we figure out which AST nodes to construct top-down. Once we know which node to construct, instead of constructing it right away, we wait for the children to be constructed. Basically instead of if (token == '(') {
auto node = new Node();
node->children = parseChildren(); we do if (token == '(') {
auto children = parseChildren();
auto node = new Node(Children()); The point in the code where we figure out what node to custruct is the same, it's just where we construct it that's different. That said, I do think the current implementation has room for simplification.
This shouldn't be a problem. As with the old parser, the AST is already expressive enough to be able to represent things such as node flags. What will have to change is the IL generator, which will have to recognize new patterns in the AST. |
Thanks for the explanation @Leonardo2718. This makes sense. What threw me off was that I was expecting to see a more canonical recursive descent implementation with function calls implementing each reduction rule. The current implementation makes this much more concise though. Let's fix the compile errors seen in the build and in meanwhile we can start reviewing the code. |
@genie-omr build all |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is my intial review. First, some general comments:
- Please reword the commit and PR titles to be in the imperative mood. Also having a high-level description of what this change does and why would be good to have in the commit message.
- Please add comments for:
- the Tril grammar; this can just be a block comment somewhere near the start of a file or just before the first function of the parser.
- the various functions defined using Doxygen format.
- all occurrences of
tokenIt++
so we know which token is being consumed.
- Flex/Lex and Bison/Yacc should be removed as dependencies from the CI build scrips.
- In other parts of the code, there is no space between identifier and template parameters (i.e.
foo<int>
instead offoo <int>
). Please update the usage of templates to conform to that style.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few more notes:
- I think it would be really helpful to specify in the comments of the different parse functions what the state of the iterator is after the function returns. For example, the comments for
buildAST()
could say something like "When the function returns, the token iterator points to the token immediately after the ')' of the parsed s-expression." Also, document what exceptions can be thrown would be helpful. - There are still spaces that should be removed between the name of templates and the template arguments.
- There are several try-catch blocks that should be removed because there is no way to recover from the errors they are catching.
79c4baa
to
2f9747e
Compare
@Yuehan-Lin a suggestion which may make it much easier for reviewers to review the code is to add individual commits addressing review comments and post them in the corresponding review thread. If you paste the SHA and say something like "Fixed in 79c4baa" for example, GitHub will hyperlink the commit which addresses the review and it gives reviewers a much smaller scope to check. If commits are being force pushed to the PR branch it can get very difficult for a reviewer to find what exactly changed between the last time they reviewed and the next. Sometimes changes will get lost and new code not properly reviewed. If changes addressing reviews are split up into commits it makes it very easy to follow. We can then squash unneeded commits at the end before a final merge. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few minor comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a great change! Thanks @Yuehan-Lin!
There are still opportunities for further improvement but I think it's best to leave that work for subsequent PRs (I need to open issues for this). As is, the change is already a significant improvement over the existing solution and will enable other work once merged.
@fjeremic I'm planning on merging this by EOD today so I am kindly requesting for your approval before then. 🙂
Signed-off-by: Yuehan-Lin <Yuehan.Lin@ibm.com>
@genie-omr build zos |
@genie-omr build all |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks much better. Thanks @Leonardo2718 for the thorough review! Agreed we can address other improvements in separate PRs as merging this work unblocks other things. Great work everyone!
With the merge of eclipse-omr#4345 we no longer depend on Flex and Bison being installed for our testing. AppVeyor seems to be broken at the moment in that checksums are failing when installing this package. We fix the issue by removing the dependency as it is no longer needed. Signed-off-by: Filip Jeremic <fjeremic@ca.ibm.com>
With the merge of eclipse-omr#4345 we no longer depend on Flex and Bison being installed for our testing. AppVeyor seems to be broken at the moment in that checksums are failing when installing this package. We fix the issue by removing the dependency as it is no longer needed. Signed-off-by: Filip Jeremic <fjeremic@ca.ibm.com>
With the merge of eclipse-omr#4345 we no longer depend on Flex and Bison being installed for our testing. AppVeyor seems to be broken at the moment in that checksums are failing when installing this package. We fix the issue by removing the dependency as it is no longer needed. Signed-off-by: Filip Jeremic <fjeremic@ca.ibm.com>
Used cpp file instead of yacc and lex.
In the
parser.cpp
, first used lexer to scanned and tokenlized the tril input, then parsed tokens with grammar rules to build AST.Fixes: #4489
Signed-off-by: Yuehan-Lin Yuehan.Lin@ibm.com