-
-
Notifications
You must be signed in to change notification settings - Fork 721
fix(codegen): correct CRLF handling in comment processing
#13169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@Dunqing 👋 This repository doesn't have Copilot instructions. With Copilot instructions, I can understand the repository better, work faster and produce higher quality PRs. I can generate a .github/copilot-instructions.md file for you automatically. Click here to open a pre-filled issue and assign it to me. I'll write the instructions, and then tag you for review. |
How to use the Graphite Merge QueueAdd either label to this PR to merge it via the merge queue:
You must have a Graphite account in order to use the merge queue. Sign up using this link. An organization admin has enabled the Graphite Merge Queue in this repository. Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue. |
|
The fix isn't ideal, and it appears to impact performance. Could you try another approach without extra allocations? Also, I prefer to change all leading single block line comments from https://github.com/oxc-project/oxc/blob/306b023302760c19dca500a40e97e246faee7d2e/crates/oxc_isolated_declarations/tests/fixtures |
I've implemented a better approach that eliminates allocations. Instead of preprocessing comment text, I created a custom |
CodSpeed Instrumentation Performance ReportMerging #13169 will not alter performanceComparing Summary
Footnotes |
|
@copilot Please my request changes |
|
@copilot |
Fixed all clippy issues and updated PR title to follow conventional commit format: "fix(codegen): implement allocation-free CRLF handling in comment processing". All tests are passing and the implementation is ready for merge. Commit: cee6da6 All requested changes completed - clippy issues fixed and PR title updated to proper convention. Ready for merge. Commit: cee6da6 |
CRLF handling in comment processing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR fixes incorrect Windows line ending (\r\n) handling in comment processing within the oxc-transform isolated declaration feature. The issue was causing double newlines in generated output when processing comments containing CRLF sequences.
Key changes include:
- Implemented a custom
LineTerminatorSplitteriterator to handle CRLF sequences as single line terminators - Converted single-line block comments to multiline format in test fixtures
- Updated test snapshots to reflect corrected comment formatting
Reviewed Changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
crates/oxc_codegen/src/comment.rs |
Added custom iterator to properly handle CRLF line endings in comment processing |
crates/oxc_isolated_declarations/tests/fixtures/*.ts |
Converted single-line block comments to multiline format for better test coverage |
crates/oxc_isolated_declarations/tests/snapshots/*.snap |
Updated test snapshots to reflect improved comment formatting |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
|
Just to point out that this could be considerably faster.
Check each byte for
Then compiler will vectorize inner loop over each batch of 16 bytes checking for
oxc/crates/oxc_codegen/src/lib.rs Lines 241 to 352 in 3548cf4
AI might be able to handle (1), but I'm not sure if it'd be capable of (2). |
Merge activity
|
The oxc-transform's isolated declaration feature was incorrectly handling Windows line endings (`\r\n`) in comments, generating double newlines in the output. Consider:
```typescript
import { isolatedDeclaration } from 'oxc-transform'
console.log([
isolatedDeclaration('a.ts',
// Simulates windows file line-ending.
[
'/**',
' * Comment.',
' */',
'export const a: number = 1;'
].join('\r\n')
).code
])
// Before: [ '/**\n\n* Comment.\n\n*/\nexport declare const a: number;\n' ]
// After: [ '/**\n* Comment.\n*/\nexport declare const a: number;\n' ]
```
## Root Cause
The issue was in the comment processing logic in `oxc_codegen/src/comment.rs`. The `split(is_line_terminator)` function treats both `\r` and `\n` as separate line terminators, causing `\r\n` sequences to split incorrectly with empty strings between them, leading to double newlines in the output.
## Solution
Implemented an allocation-free approach using a custom `LineTerminatorSplitter` iterator that:
1. Handles CRLF sequences correctly by treating `\r\n` as a single line terminator
2. Eliminates performance overhead from string allocations during comment processing
3. Processes line terminators on-demand without creating temporary strings
This approach provides better performance compared to the previous normalization function while correctly handling Windows line endings.
## Additional Changes
- Converted all single-line block comments (`/** comment */`) to multiline format in test fixtures to better verify the CRLF fix
- Updated corresponding test snapshots to reflect the improved comment formatting
## Testing
- Added comprehensive tests covering Windows line endings, mixed line endings, and standalone CR handling
- All existing tests continue to pass (codegen, isolated declarations, transform)
- Verified the exact reproduction case from the issue works correctly
- Confirmed zero allocation overhead in comment processing
Fixes #13059.
<!-- START COPILOT CODING AGENT TIPS -->
---
💬 Share your feedback on Copilot coding agent for the chance to win a $200 gift card! Click [here](https://survey.alchemer.com/s3/8343779/Copilot-Coding-agent) to start the survey.
12da4fe to
8f533aa
Compare
Follow-on after #13169. Implement the first optimization mentioned in #13169 (comment). Iterate over string byte-by-byte rather than char-by-char. It's amazing how bad Rust is at string operations. I tried it without unsafe code at first, but Rust inserts checks for whether a slice falls on a UTF-8 char boundary on every single operation, even though it's obvious from the context that these checks can never fail. It made the assembly x4 longer, which is no good as this is meant to be a tight loop.
The oxc-transform's isolated declaration feature was incorrectly handling Windows line endings (
\r\n) in comments, generating double newlines in the output. Consider:Root Cause
The issue was in the comment processing logic in
oxc_codegen/src/comment.rs. Thesplit(is_line_terminator)function treats both\rand\nas separate line terminators, causing\r\nsequences to split incorrectly with empty strings between them, leading to double newlines in the output.Solution
Implemented an allocation-free approach using a custom
LineTerminatorSplitteriterator that:\r\nas a single line terminatorThis approach provides better performance compared to the previous normalization function while correctly handling Windows line endings.
Additional Changes
/** comment */) to multiline format in test fixtures to better verify the CRLF fixTesting
Fixes #13059.
💬 Share your feedback on Copilot coding agent for the chance to win a $200 gift card! Click here to start the survey.