-
-
Notifications
You must be signed in to change notification settings - Fork 722
perf(codegen): faster splitting comments into lines #13190
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf(codegen): faster splitting comments into lines #13190
Conversation
How to use the Graphite Merge QueueAdd either label to this PR to merge it via the merge queue:
You must have a Graphite account in order to use the merge queue. Sign up using this link. An organization admin has enabled the Graphite Merge Queue in this repository. Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue. This stack of pull requests is managed by Graphite. Learn more about stacking. |
CodSpeed Instrumentation Performance ReportMerging #13190 will not alter performanceComparing Summary
Footnotes |
6e4329a to
e1bf875
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR optimizes the performance of splitting comments into lines by iterating over UTF-8 bytes instead of Unicode characters. The changes implement a byte-based approach to identify line terminators while properly handling CRLF sequences and Unicode line separators (LS and PS).
- Replaced character-by-character iteration with byte-by-byte processing for better performance
- Added support for Unicode line separators (LS and PS) in addition to CR/LF
- Removed the position field from the iterator struct in favor of modifying the text slice directly
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
e1bf875 to
bffe03c
Compare
Merge activity
|
Follow-on after #13169. Implement the first optimization mentioned in #13169 (comment). Iterate over string byte-by-byte rather than char-by-char. It's amazing how bad Rust is at string operations. I tried it without unsafe code at first, but Rust inserts checks for whether a slice falls on a UTF-8 char boundary on every single operation, even though it's obvious from the context that these checks can never fail. It made the assembly x4 longer, which is no good as this is meant to be a tight loop.
bffe03c to
e3bfff1
Compare
|
Unfortunately not. From the docs:
We need to split on |

Follow-on after #13169.
Implement the first optimization mentioned in #13169 (comment). Iterate over string byte-by-byte rather than char-by-char.
It's amazing how bad Rust is at string operations. I tried it without unsafe code at first, but Rust inserts checks for whether a slice falls on a UTF-8 char boundary on every single operation, even though it's obvious from the context that these checks can never fail. It made the assembly x4 longer, which is no good as this is meant to be a tight loop.