Skip to content

Commit c60fe15

Browse files
authored
[Parser][NFC] Improve performance of idchar lexing (#6515)
The parsing of idchars was hot enough to show up while profiling the parsing of a very large module. Optimize it to speed up the overall parse by about 16% in a very unscientific measurement.
1 parent 4a907b0 commit c60fe15

File tree

1 file changed

+18
-30
lines changed

1 file changed

+18
-30
lines changed

src/parser/lexer.cpp

Lines changed: 18 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -753,37 +753,25 @@ std::optional<LexResult> idchar(std::string_view in) {
753753
return {};
754754
}
755755
uint8_t c = ctx.peek();
756-
if (('0' <= c && c <= '9') || ('A' <= c && c <= 'Z') ||
757-
('a' <= c && c <= 'z')) {
758-
ctx.take(1);
759-
} else {
760-
switch (c) {
761-
case '!':
762-
case '#':
763-
case '$':
764-
case '%':
765-
case '&':
766-
case '\'':
767-
case '*':
768-
case '+':
769-
case '-':
770-
case '.':
771-
case '/':
772-
case ':':
773-
case '<':
774-
case '=':
775-
case '>':
776-
case '?':
777-
case '@':
778-
case '\\':
779-
case '^':
780-
case '_':
781-
case '`':
782-
case '|':
783-
case '~':
784-
ctx.take(1);
785-
}
756+
// All the allowed characters lie in the range '!' to '~', and within that
757+
// range the vast majority of characters are allowed, so it is significantly
758+
// faster to check for the disallowed characters instead.
759+
if (c < '!' || c > '~') {
760+
return ctx.lexed();
761+
}
762+
switch (c) {
763+
case '"':
764+
case '(':
765+
case ')':
766+
case ',':
767+
case ';':
768+
case '[':
769+
case ']':
770+
case '{':
771+
case '}':
772+
return ctx.lexed();
786773
}
774+
ctx.take(1);
787775
return ctx.lexed();
788776
}
789777

0 commit comments

Comments
 (0)