Skip to content

Conversation

@anonrig
Copy link
Member

@anonrig anonrig commented Dec 11, 2025

Fixes #5387


The WPT tests failed because we didn't handle utf-16 surrogate pairs correctly as the spec requires us, across separate chunks. Added a pending state to buffer a trailing high surrogate until the next chunk arrives, then either we pair it with a matching low surrogate or emit replacement character. Tried to use as much as simdutf code possible to make the code more readable and maintainable for people who are not experts in such areas.

Comments are generated by AI

@anonrig anonrig requested review from a team as code owners December 11, 2025 22:40
@anonrig anonrig force-pushed the yagiz/streams-utf8-wpt branch 2 times, most recently from f80b87b to f2f10c4 Compare December 11, 2025 22:54
@anonrig anonrig force-pushed the yagiz/streams-utf8-wpt branch 3 times, most recently from 508506e to 99423c5 Compare December 12, 2025 22:58
Copy link
Contributor

@erikcorry erikcorry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion to simplify this in #5702

@anonrig anonrig requested review from erikcorry and jasnell December 16, 2025 12:33
Copy link
Contributor

@erikcorry erikcorry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, FWIW since it now has my code in it too :-)

@anonrig anonrig force-pushed the yagiz/streams-utf8-wpt branch from 59cc2d9 to d67061b Compare December 16, 2025 16:47
@anonrig
Copy link
Member Author

anonrig commented Dec 16, 2025

Rebased and force pushed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Investigate issues with chunks when UTF-8 encoding

3 participants