Skip to content

Consider parametrizing the lexer's step function to handle maximal munch / glue / jointness dynamically #2

@fmease

Description

@fmease

In order to avoid having to split tokens on demand in the parser (e.g., && for borrows vs. && for logical and or hopefully even 0.0 float lit vs f.0.0 repeated tuple field access). That of course implies not lexing upfront into a Vec. Instead, the parser needs to lazily advance the lexer. Of course, this would render lookahead and backtracking a lot more painful and tricky as we'd need to manually buffer tokens (and it's unclear to me right now if that very buffering requires ad hoc splitting anyways). This is an experiment.

Side note: We do want to follow rustc's model very closely (of having maximum munch ("by default")) since the goal is 100% parity unlike e.g., syn whose parser consumes "unglued punctuation" I believe and likely checks jointness only where required (meaning it never has to split anything). I tried to follow that model in the beginning but I have given up because the parser turned really hairy with all the token.touches(other_token) checks. Note: A long time goal (~10 years old idk) for rustc's parser is to not operate on glued tokens. Until then, however we're stuck with that model.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions