Conversation
clang-format 20 formats the lines with new-lines, but we don't want it.
| rbs_position_t current; /* The current position: just before the current_character */ | ||
| rbs_position_t start; /* The start position of the current token */ | ||
|
|
||
| unsigned int current_code_point; /* Current character code point */ |
There was a problem hiding this comment.
The lexer data structure now stores the code point of next character, so that peeking next character can be implemented really faster than reading next character from buffer.
| return lexer->current_code_point; | ||
| } | ||
|
|
||
| bool rbs_next_char(rbs_lexer_t *lexer, unsigned int *codepoint, size_t *byte_len) { |
There was a problem hiding this comment.
This function assigns the next codepoint in the buffer and it's byte length.
| const char *start = lexer->string.start + lexer->current.byte_pos; | ||
|
|
||
| // Fast path for ASCII (single-byte) characters | ||
| if ((unsigned int) *start < 128) { |
There was a problem hiding this comment.
We assume the character encoding of RBS files is ASCII compatible, like Ruby source file.
If it is ASCII character, it is a single-byte character.
| unsigned int c = rbs_utf8_string_to_codepoint(str); | ||
| lexer->last_char = c; | ||
| return c; | ||
| *codepoint = 12523; // Dummy data for "ル" from "ルビー" (Ruby) in Unicode |
There was a problem hiding this comment.
Another hack to support encoding other than utf-8.
It doesn't know the exact unicode code point of the next character in other encoding, and it returns a random code point instead. Lexer reads the character, but because the random character doesn't have any meaning for lexer, it works perfectly.
We may want to return a upper case character to support multi-byte class/constant names.
Extracted from #2652
This PR improves the data structure of lexer in RBS.
It improves the parsing performance from ~
14 i/sto ~16 i/smeasured bybin/benchmark-parse.rb.Details
I have `gem_rbs_collection` repository too to load `activerecord` rbs files.Baseline
Fix lexer