perf(codegen): faster splitting comments into lines #13190

overlookmotel · 2025-08-18T14:07:49Z

Follow-on after #13169.

Implement the first optimization mentioned in #13169 (comment). Iterate over string byte-by-byte rather than char-by-char.

It's amazing how bad Rust is at string operations. I tried it without unsafe code at first, but Rust inserts checks for whether a slice falls on a UTF-8 char boundary on every single operation, even though it's obvious from the context that these checks can never fail. It made the assembly x4 longer, which is no good as this is meant to be a tight loop.

overlookmotel · 2025-08-18T14:08:06Z

refactor(codegen): reduce repeated code #13191
perf(codegen): faster splitting comments into lines #13190 👈 (View in Graphite)
main

How to use the Graphite Merge Queue

Add either label to this PR to merge it via the merge queue:

0-merge - adds this PR to the back of the merge queue
hotfix - for urgent hot fixes, skip the queue and merge this PR next

You must have a Graphite account in order to use the merge queue. Sign up using this link.

_{An organization admin has enabled the Graphite Merge Queue in this repository.} _{Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.}

This stack of pull requests is managed by Graphite. Learn more about stacking.

crates/oxc_codegen/src/comment.rs

codspeed-hq · 2025-08-18T14:13:30Z

CodSpeed Instrumentation Performance Report

Merging #13190 will not alter performance

_{Comparing 08-18-perf_codegen_faster_splitting_comments_into_lines (e3bfff1) with main (ada4e84)¹}

Summary

✅ 34 untouched benchmarks

No successful run was found on main (e3bfff1) during the generation of this report, so ada4e84 was used instead as the comparison base. There might be some changes unrelated to this pull request in this report. ↩

Copilot

Pull Request Overview

This PR optimizes the performance of splitting comments into lines by iterating over UTF-8 bytes instead of Unicode characters. The changes implement a byte-based approach to identify line terminators while properly handling CRLF sequences and Unicode line separators (LS and PS).

Replaced character-by-character iteration with byte-by-byte processing for better performance
Added support for Unicode line separators (LS and PS) in addition to CR/LF
Removed the position field from the iterator struct in favor of modifying the text slice directly

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

crates/oxc_codegen/src/comment.rs

graphite-app · 2025-08-19T00:32:27Z

Merge activity

Aug 19, 12:32 AM UTC: Boshen added this pull request to the Graphite merge queue.
Aug 19, 12:38 AM UTC: Merged by the Graphite merge queue.

Follow-on after #13169. Implement the first optimization mentioned in #13169 (comment). Iterate over string byte-by-byte rather than char-by-char. It's amazing how bad Rust is at string operations. I tried it without unsafe code at first, but Rust inserts checks for whether a slice falls on a UTF-8 char boundary on every single operation, even though it's obvious from the context that these checks can never fail. It made the assembly x4 longer, which is no good as this is meant to be a tight loop.

hyrious · 2025-08-20T04:19:41Z

@overlookmotel does https://doc.rust-lang.org/std/primitive.str.html#method.lines help?

overlookmotel · 2025-08-20T18:41:14Z

Unfortunately not. From the docs:

Note that any carriage return (\r) not immediately followed by a line feed (\n) does not split a line. These carriage returns are thereby included in the produced lines.

We need to split on \r, \n, \r\n, and also irregular Unicode line breaks <LS> and <PS>.

github-actions bot added A-codegen Area - Code Generation C-performance Category - Solution not expected to change functional behavior, only performance labels Aug 18, 2025

overlookmotel marked this pull request as ready for review August 18, 2025 14:10

Copilot AI review requested due to automatic review settings August 18, 2025 14:11

graphite-app bot reviewed Aug 18, 2025

View reviewed changes

crates/oxc_codegen/src/comment.rs Outdated Show resolved Hide resolved

overlookmotel force-pushed the 08-18-perf_codegen_faster_splitting_comments_into_lines branch from 6e4329a to e1bf875 Compare August 18, 2025 14:17

overlookmotel marked this pull request as draft August 18, 2025 14:19

Copilot AI reviewed Aug 18, 2025

View reviewed changes

crates/oxc_codegen/src/comment.rs Show resolved Hide resolved

crates/oxc_codegen/src/comment.rs Show resolved Hide resolved

crates/oxc_codegen/src/comment.rs Show resolved Hide resolved

overlookmotel mentioned this pull request Aug 18, 2025

refactor(codegen): reduce repeated code #13191

Merged

overlookmotel force-pushed the 08-18-perf_codegen_faster_splitting_comments_into_lines branch from e1bf875 to bffe03c Compare August 18, 2025 14:49

overlookmotel marked this pull request as ready for review August 18, 2025 14:53

overlookmotel mentioned this pull request Aug 18, 2025

codegen: further improvement in multiline comments handling #13188

Open

graphite-app bot added the 0-merge Merge with Graphite Merge Queue label Aug 19, 2025

graphite-app bot force-pushed the 08-18-perf_codegen_faster_splitting_comments_into_lines branch from bffe03c to e3bfff1 Compare August 19, 2025 00:33

graphite-app bot merged commit e3bfff1 into main Aug 19, 2025
24 checks passed

graphite-app bot deleted the 08-18-perf_codegen_faster_splitting_comments_into_lines branch August 19, 2025 00:38

graphite-app bot removed the 0-merge Merge with Graphite Merge Queue label Aug 19, 2025

oxc-bot mentioned this pull request Aug 20, 2025

release(crates): v0.82.3 #13230

Merged

Copilot AI mentioned this pull request Sep 8, 2025

codegen: Optimize multiline comments handling with SIMD processing #13593

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

perf(codegen): faster splitting comments into lines #13190

perf(codegen): faster splitting comments into lines #13190

Uh oh!

overlookmotel commented Aug 18, 2025 •

edited

Loading

Uh oh!

overlookmotel commented Aug 18, 2025 •

edited

Loading

Uh oh!

Uh oh!

codspeed-hq bot commented Aug 18, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

graphite-app bot commented Aug 19, 2025 •

edited

Loading

Uh oh!

Uh oh!

hyrious commented Aug 20, 2025

Uh oh!

overlookmotel commented Aug 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

perf(codegen): faster splitting comments into lines #13190

perf(codegen): faster splitting comments into lines #13190

Uh oh!

Conversation

overlookmotel commented Aug 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

overlookmotel commented Aug 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

How to use the Graphite Merge Queue

Uh oh!

Uh oh!

codspeed-hq bot commented Aug 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CodSpeed Instrumentation Performance Report

Merging #13190 will not alter performance

Summary

Footnotes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

graphite-app bot commented Aug 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merge activity

Uh oh!

Uh oh!

hyrious commented Aug 20, 2025

Uh oh!

overlookmotel commented Aug 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

overlookmotel commented Aug 18, 2025 •

edited

Loading

overlookmotel commented Aug 18, 2025 •

edited

Loading

codspeed-hq bot commented Aug 18, 2025 •

edited

Loading

graphite-app bot commented Aug 19, 2025 •

edited

Loading