-
-
Notifications
You must be signed in to change notification settings - Fork 730
fix(minifier): fix string concatenation for lone surrogates #13229
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
How to use the Graphite Merge QueueAdd either label to this PR to merge it via the merge queue:
You must have a Graphite account in order to use the merge queue. Sign up using this link. An organization admin has enabled the Graphite Merge Queue in this repository. Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue. |
|
This is a lot of code to support lone surrogates, I'll take a deeper look later. |
CodSpeed Instrumentation Performance ReportMerging #13229 will not alter performanceComparing Summary
Footnotes |
@Boshen Thanks! That would be great and I'd love to fix all the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Fixes string concatenation optimization to properly handle lone surrogates in Unicode string literals. The PR modifies the ToJsString trait to return a tuple containing both the string value and a boolean flag indicating the presence of lone surrogates, preventing unsafe optimizations that could produce incorrect output.
Key changes:
- Modified
ToJsStringtrait to return(Cow<'a, str>, bool)instead of justCow<'a, str> - Added checks throughout the codebase to avoid optimizations when lone surrogates are present
- Added test cases to verify correct handling of lone surrogate concatenation
Reviewed Changes
Copilot reviewed 16 out of 16 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| crates/oxc_ecmascript/src/to_string.rs | Core trait modification to track lone surrogates in string conversion |
| crates/oxc_minifier/src/peephole/fold_constants.rs | Updated string concatenation logic to handle lone surrogates |
| crates/oxc_minifier/src/peephole/substitute_alternate_syntax.rs | Added lone surrogate checks in template literal substitution |
| crates/oxc_minifier/tests/peephole/obscure_edge_cases.rs | Added test cases for lone surrogate string concatenation |
| crates/oxc_minifier/tests/peephole/mod.rs | Added new test helper function for output verification |
| crates/oxc_ecmascript/src/constant_evaluation/value.rs | Updated ConstantValue::String to include lone surrogate flag |
| crates/oxc_ecmascript/src/constant_evaluation/mod.rs | Updated constant evaluation to propagate lone surrogate information |
| crates/oxc_ecmascript/src/constant_evaluation/call_expr.rs | Added lone surrogate handling in string method optimizations |
| crates/oxc_minifier/src/ctx.rs | Updated expression creation to handle lone surrogates |
| crates/oxc_minifier/src/peephole/inline.rs | Updated inline optimization check for string length |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
| strings | ||
| .map(|v| v.iter().map(AsRef::as_ref).collect::<Vec<_>>().join(separator.unwrap_or(","))) | ||
| // If any element contains a lone surrogate, we cannot join them as strings. | ||
| if strings.iter().any(|s| s.iter().any(|s| s.1)) { |
Copilot
AI
Aug 20, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The nested iterator chain is hard to read. Consider flattening this logic or adding a comment to explain what s.1 represents (the lone_surrogates flag).
| if strings.iter().any(|s| s.iter().any(|s| s.1)) { | |
| // Here, s.1 is the lone_surrogates flag returned by to_js_string. | |
| if strings.iter().flatten().any(|(_, lone_surrogates)| *lone_surrogates) { |
| if let Some((_, lone_surrogates)) = search_value | ||
| && lone_surrogates | ||
| { | ||
| return None; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pattern matching syntax here is incorrect. Rust doesn't support combining a pattern match with a boolean condition in this way. This should be rewritten as:
if let Some((_, lone_surrogates)) = search_value {
if lone_surrogates {
return None;
}
}This properly separates the pattern matching from the conditional check on the extracted value.
| if let Some((_, lone_surrogates)) = search_value | |
| && lone_surrogates | |
| { | |
| return None; | |
| if let Some((_, lone_surrogates)) = search_value | |
| { | |
| if lone_surrogates | |
| { | |
| return None; | |
| } | |
| } |
Spotted by Diamond
Is this helpful? React 👍 or 👎 to let us know.
| } | ||
|
|
||
| #[track_caller] | ||
| fn test_output(source_text: &str, expected: &str) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a standalone test runner.
The rationale behind this is that the original test runner would generate actual: "\ud83d\udd25" and expected: "🔥" if I write
test("return '\\uD83D' + '\\uDD25'", "return '\\ud83d\\udd25'")
Although the execution result in JavaScript runtime is the same, it's not the same as it compares with each other in Rust. The expected half will be interpreted as a surrogate pair in Rust and the actual half will be an escaped surrogate pair.
|
FYI: we've replaced our implementation here in swc with See: |
Fixes a simple use case in optimizing lone surrogates. Also initialized and pinpointed some places that need a fix in future PRs.
With input:
Previously outputs:
Now outputs:
Ported some optimization fixes I made in swc-project/swc#10987 and I'm really appreciate the job you all have done. Without the previous investigation made by @overlookmotel, @Boshen and Oxc team, this will not happen.