Skip to content

Make Rule::re_str public again. #477

@pmfirestone

Description

@pmfirestone

In commit c1992b, @ltratt depubbed a number of functions and fields. The commit message includes the following justification:

grmtools intentionally exposes some details that no-one currently uses, but they might one day

I think that day has arrived. I am implementing an incremental parsing algorithm, from this paper, page 34. This is part of my project to reimplement the existing Python version of the Syncode algorithm in Rust. @shubhamugare is the boss of the project, but I'm working on the Rust version.

In order to integrate cleanly with other logic in the implementation so far, I need to be able to convert from a Lexeme struct back to a regular expression (this happens in lines 17, 18, and 21 of Algorithm 4 of the linked paper): I then do some crunching on these regexes by turning them into DFAs and advancing one state at a time; this requires the regex representations of the lexemes in the input (cf. page 10: the algorithm simply assumes that this transformation is trivially possible, and in the current Python implementation, it is).

I worry that this might be an XY problem: I perhaps I could manually track which regexes go with which TIdxs. However, it seems to me that making the re_str field of the Rule struct pub again (instead of pub(super) as it is now) would solve the problem for me much more gracefully than introducing such logic into my program. I will already have to use LexerDef::set_rule_ids to synchronize the rules' ids between the parser and the lexer, but this still doesn't allow me direct access to the underlying regexes. I also don't believe this can be gotten out of the cfgrammar crate, but maybe an accessor function can be added to the YaccGrammar struct to match token_epp, token_name, and token_precedence.

This is a somewhat unusual use case, I admit, but I believe that the particular application we are developing justifies considering the change. On the other hand, do you have any alternative suggestions for getting the regex back from a TIdx? It's very possible I've overlooked something! Thanks for the hard work and have a great day 😃.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions