Skip to content

Conversation

@overlookmotel
Copy link
Member

@overlookmotel overlookmotel commented Jul 25, 2025

Small optimization to lexer. A hashbang can only appear at very start of file, so only check for hashbang when getting first token. This streamlines the byte handler for #, because a # anywhere else can only be a private identifier.

Note: self.token.set_is_on_new_line(true); in read_hashbang_comment is not required, because it's always true already.

@github-actions github-actions bot added A-parser Area - Parser C-performance Category - Solution not expected to change functional behavior, only performance labels Jul 25, 2025
Copy link
Member Author


How to use the Graphite Merge Queue

Add either label to this PR to merge it via the merge queue:

  • 0-merge - adds this PR to the back of the merge queue
  • hotfix - for urgent hot fixes, skip the queue and merge this PR next

You must have a Graphite account in order to use the merge queue. Sign up using this link.

An organization admin has enabled the Graphite Merge Queue in this repository.

Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.

This stack of pull requests is managed by Graphite. Learn more about stacking.

@overlookmotel overlookmotel marked this pull request as ready for review July 25, 2025 18:14
@codspeed-hq
Copy link

codspeed-hq bot commented Jul 25, 2025

CodSpeed Instrumentation Performance Report

Merging #12521 will improve performances by 8.47%

Comparing 07-25-perf_lexer_only_check_for_hashbang_at_start_of_file (47a565f) with main (c72f49e)

Summary

⚡ 4 improvements
✅ 30 untouched benchmarks

Benchmarks breakdown

Benchmark BASE HEAD Change
lexer[RadixUIAdoptionSection.jsx] 20.8 µs 20 µs +3.91%
lexer[binder.ts] 930.2 µs 870.5 µs +6.85%
lexer[cal.com.tsx] 5.8 ms 5.3 ms +8.47%
lexer[react.development.js] 384 µs 357.1 µs +7.54%

@overlookmotel overlookmotel force-pushed the 07-25-perf_lexer_only_check_for_hashbang_at_start_of_file branch from 875bccc to 41cb852 Compare July 25, 2025 18:30
Copy link
Contributor

@camc314 camc314 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice 💪

@overlookmotel
Copy link
Member Author

Hmm. I'm not sure that benchmark is right. First version made no difference at all. I may have made a mistake.

@overlookmotel overlookmotel marked this pull request as draft July 25, 2025 18:52
@Boshen Boshen force-pushed the 07-25-perf_lexer_only_check_for_hashbang_at_start_of_file branch from 41cb852 to c181f5f Compare August 10, 2025 07:29
@Boshen Boshen marked this pull request as ready for review August 10, 2025 07:29
Copilot AI review requested due to automatic review settings August 10, 2025 07:29
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR optimizes the lexer by restricting hashbang comment detection to only the first token of a file. Since hashbang comments can only appear at the very start of a file, this eliminates unnecessary checks for every # character encountered during lexing.

  • Adds a dedicated first_token() method that specifically checks for hashbang comments
  • Simplifies the # byte handler to only handle private identifiers
  • Updates parser and benchmark code to use the new first_token() method

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
tasks/benchmark/benches/lexer.rs Updates benchmark to use first_token() method and adds EOF check
crates/oxc_parser/src/lib.rs Replaces bump_any() with first_token() call in parser initialization
crates/oxc_parser/src/lexer/mod.rs Adds new first_token() method and inlines read_next_token()
crates/oxc_parser/src/lexer/comment.rs Makes read_hashbang_comment() unsafe and removes unnecessary line setting
crates/oxc_parser/src/lexer/byte_handlers.rs Simplifies # handler to only process private identifiers

@Boshen Boshen added the 0-merge Merge with Graphite Merge Queue label Aug 10, 2025
Copy link
Member

Boshen commented Aug 10, 2025

Merge activity

Small optimization to lexer. A hashbang can only appear at very start of file, so only check for hashbang when getting first token. This streamlines the byte handler for `#`, because a `#` anywhere else can only be a private identifier.

Note: `self.token.set_is_on_new_line(true);` in `read_hashbang_comment` is not required, because it's always `true` already.
@graphite-app graphite-app bot force-pushed the 07-25-perf_lexer_only_check_for_hashbang_at_start_of_file branch from c181f5f to 47a565f Compare August 10, 2025 07:36
@graphite-app graphite-app bot merged commit 47a565f into main Aug 10, 2025
31 checks passed
@graphite-app graphite-app bot deleted the 07-25-perf_lexer_only_check_for_hashbang_at_start_of_file branch August 10, 2025 07:42
@graphite-app graphite-app bot removed the 0-merge Merge with Graphite Merge Queue label Aug 10, 2025
taearls pushed a commit to taearls/oxc that referenced this pull request Aug 12, 2025
)

Small optimization to lexer. A hashbang can only appear at very start of file, so only check for hashbang when getting first token. This streamlines the byte handler for `#`, because a `#` anywhere else can only be a private identifier.

Note: `self.token.set_is_on_new_line(true);` in `read_hashbang_comment` is not required, because it's always `true` already.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-parser Area - Parser C-performance Category - Solution not expected to change functional behavior, only performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants