Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lexer cleanup: Split literal lexing to separate file and simplify Cursor. #82757

Conversation

Julian-Wollersberger
Copy link
Contributor

The functions that lexed string and number literals were moved to a new file literals.rs, to make lib.rs smaller and more organized. I made them freestanding functions instead of methods on Cursor to have better separation of concerns: The Cursor should only be responsible for iterating char-by-char, and not for the lexer logic.

The first three commits only move code around and change the function calls from self.foo() to foo(cursor). The fourth renames the widely-used .first(), .second() and eat_while() methods to .peek(), .peek_second() and .bump_while(). The fifth commit inlines some helper functions. They were so small that it was harder to keep them in my head than to understand what the code does when inlined.

Julian Wollersberger added 3 commits February 13, 2021 19:33
Also make them freestanding functions instead of methods.
This has better separation of concern between the lexing and the Cursor's iterator-like functionality.
@rust-highfive
Copy link
Collaborator

r? @varkor

(rust-highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Mar 4, 2021
@rust-log-analyzer

This comment has been minimized.

Julian Wollersberger added 3 commits March 4, 2021 12:22
…mber of things I need to keep in my head.

And fix imports in tests.rs.
This one wasn't shown with `cargo check`.
@petrochenkov
Copy link
Contributor

cc @matklad

use crate::literals::{
double_quoted_string, lifetime_or_char, number, raw_double_quoted_string, single_quoted_string,
LiteralKind,
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style nit: could you avoid multi-line imports?

@petrochenkov
Copy link
Contributor

petrochenkov commented Mar 6, 2021

  • No opinion on the move from Cursor methods to free functions, I don't really understand what is a Cursor's job and what is not. I'm interested in @matklad's opinion on this.
  • Same about renaming first/second to peek/peek_second.

Otherwise LGTM.

@petrochenkov petrochenkov removed their assignment Mar 6, 2021
@matklad
Copy link
Member

matklad commented Mar 6, 2021

No strong feelings here!

I personally agree that using funcitons for "grammar rules" and methods for inspecting tokens is cleaner, but this is also somewhat unusual. If we do this, someone might "cleanup" them back to methods in a couple of years. See also this and this.

No opinion on helper's renaming.

Keeping all rules in a single file was intentional: ~1k lines is not that big, especially with method bodies folded in the editor, and grammars are inherently not compositional (two different rules might intersect with each other).

Helpers for grammar categories, like raw_ident or whitespace were intentional, to have "one grammar production = one function" correspondence

first_token/advance_token split was intentional, to make it easier to see the public API of the crate at a glance, and to have the signature advance_token match more closely to signatures of other "production" functions (should've named it token though).

But, as I've said, I have no strong opinion here!

@petrochenkov
Copy link
Contributor

If we do this, someone might "cleanup" them back

Yeah, my impression about this PR is that is certainly a refactoring, but it's not obvious that it's a cleanup rather than changing the code to match a personal style of the author.

@varkor
Copy link
Member

varkor commented Mar 7, 2021

I feel I have less of a stake in the organisation of these files, and so if there isn't a strong consensus, it would be worth letting someone who is affected more directly by these changes make the final call. Perhaps if a refactoring isn't clearly an improvement, it's better not to make those changes, since at the very least it causes additional churn.

r? @matklad

@rust-highfive rust-highfive assigned matklad and unassigned varkor Mar 7, 2021
@matklad
Copy link
Member

matklad commented Mar 8, 2021

It looks like there's a rough consensus that the benefits here do not out-weigh churn, so I am going to close this. I'd also like to add that the high order bit about code-cleanness here is that lexer doesn't depend on any other parts of the compiler and has a straightforward data-based interface. So it really doesn't matter much how is it structured internally.

Thanks for the pull request regardless @Julian-Wollersberger!

@matklad matklad closed this Mar 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-review Status: Awaiting review from the assignee but also interested parties.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants