-
Notifications
You must be signed in to change notification settings - Fork 31k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
src: optimize buffer transcode for matching encodings #54507
base: main
Are you sure you want to change the base?
Conversation
Benchmark ci: https://ci.nodejs.org/view/Node.js%20benchmark/job/benchmark-node-micro-benchmarks/1614/ Results:
|
2c6ebcb
to
9af9206
Compare
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Breaking change:
$ node -p "buffer.transcode(Buffer.from([0x80]), 'utf8', 'utf8')"
<Buffer ef bf bd>
$ ./node -p "buffer.transcode(Buffer.from([0x80]), 'utf8', 'utf8')"
<Buffer 80>
if this is intentional, it needs to come with a doc update and tests. That being said, it's probably the most consistent/least unexpected to keep the existing behavior – in that case, this optimization can only be applied if the input is known to be valid.
@addaleax So, if the input is invalid, it replaces invalid characters with replacement characters, even if from and to encodings are same. Is there an encoding type (or plural) that doesn't conform to this expectation? |
@anonrig I'd be surprised if there were (you may just want to check manually to be sure), but obviously the replacement character itself differs between encodings. But yeah, the point is that |
@anonrig We shall add transcoding with replacement to simdutf in the future. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As pointed out already, a check is missing.
If the buffer encodings match, we can just perform a copy of the buffer