Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

util: add fast path for Latin1 decoding #55275

Open
wants to merge 15 commits into
base: main
Choose a base branch
from

Conversation

mertcanaltin
Copy link
Member

@mertcanaltin mertcanaltin commented Oct 5, 2024

I added a fast path for Latin1 (windows-1252) decoding to improve performance. This change avoids using the slower ICU-based decoding for Latin1 and instead utilizes a direct approach, similar to the fast path implemented for UTF-8.
nodejs/performance#178

@nodejs-github-bot
Copy link
Collaborator

Review requested:

  • @nodejs/security-wg
  • @nodejs/v8-update

@nodejs-github-bot nodejs-github-bot added c++ Issues and PRs that require attention from people who are familiar with C++. encoding Issues and PRs related to the TextEncoder and TextDecoder APIs. needs-ci PRs that need a full CI run. v8 engine Issues and PRs related to the V8 dependency. labels Oct 5, 2024
@mertcanaltin mertcanaltin requested review from anonrig and joyeecheung and removed request for anonrig October 5, 2024 08:45
Copy link

codecov bot commented Oct 5, 2024

Codecov Report

Attention: Patch coverage is 21.05263% with 30 lines in your changes missing coverage. Please review.

Project coverage is 88.39%. Comparing base (bbdfeeb) to head (77bc5ee).
Report is 222 commits behind head on main.

Files with missing lines Patch % Lines
src/encoding_binding.cc 10.34% 26 Missing ⚠️
lib/internal/encoding.js 55.55% 3 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #55275      +/-   ##
==========================================
- Coverage   88.41%   88.39%   -0.03%     
==========================================
  Files         652      652              
  Lines      186612   186799     +187     
  Branches    36062    36120      +58     
==========================================
+ Hits       165001   165117     +116     
- Misses      14885    14944      +59     
- Partials     6726     6738      +12     
Files with missing lines Coverage Δ
src/encoding_binding.h 100.00% <ø> (ø)
lib/internal/encoding.js 98.87% <55.55%> (-0.64%) ⬇️
src/encoding_binding.cc 71.42% <10.34%> (-12.82%) ⬇️

... and 48 files with indirect coverage changes

@anonrig
Copy link
Member

anonrig commented Oct 5, 2024

Can you update benchmarks as well?

lib/internal/encoding.js Outdated Show resolved Hide resolved
lib/internal/encoding.js Show resolved Hide resolved
lib/internal/encoding.js Outdated Show resolved Hide resolved
lib/internal/encoding.js Outdated Show resolved Hide resolved
lib/internal/encoding.js Show resolved Hide resolved
lib/internal/encoding.js Outdated Show resolved Hide resolved
@mertcanaltin
Copy link
Member Author

mertcanaltin commented Oct 5, 2024

Can you update benchmarks as well?

I wonder if this is the right place.
benchmark/util/text-decoder.js

@lemire
Copy link
Member

lemire commented Oct 5, 2024

Would be nice to have before/after benchmark numbers, even if it is just on someone's laptop.

@mertcanaltin
Copy link
Member Author

I think I shared my comparison with you in a wrong way, I'm sorry, there is a lot of data, I wonder if you have a suggestion on this

src/encoding_binding.cc Outdated Show resolved Hide resolved
mertcanaltin and others added 2 commits October 6, 2024 07:53
Co-authored-by: Yagiz Nizipli <yagiz@nizipli.com>
@lemire
Copy link
Member

lemire commented Oct 6, 2024

@mertcanaltin

Try something like this...

$ npm install -g node-benchmark-compare

$ node benchmark/compare.js --new mynewnode --old myoldnode --runs 1000 --filter text-decoder  util >  text-decoder.csv

$ node-benchmark-compare text-decoder.csv 

Replace mynewnode and myoldnode by the actual node binaries.

@anonrig
Copy link
Member

anonrig commented Oct 7, 2024

You need to update the benchmark file to actually test windows-1252 rather than latin1. It appears that the benchmark passes latin1 (Which is wrong)

src/encoding_binding.cc Outdated Show resolved Hide resolved
Co-authored-by: Yagiz Nizipli <yagiz@nizipli.com>
src/encoding_binding.cc Outdated Show resolved Hide resolved
@mertcanaltin
Copy link
Member Author

so here we are, we've started a benchmark, @RafaelGSS thanks
https://ci.nodejs.org/view/Node.js%20benchmark/job/benchmark-node-micro-benchmarks/1644/

@mertcanaltin mertcanaltin added the request-ci Add this label to start a Jenkins CI on a PR. label Nov 5, 2024
@github-actions github-actions bot added request-ci-failed An error occurred while starting CI via request-ci label, and manual interventon is needed. and removed request-ci Add this label to start a Jenkins CI on a PR. labels Nov 5, 2024
Copy link
Contributor

github-actions bot commented Nov 5, 2024

Failed to start CI
   ⚠  Something was pushed to the Pull Request branch since the last approving review.
   ✘  Refusing to run CI on potentially unsafe PR
https://github.com/nodejs/node/actions/runs/11675818532

@lemire
Copy link
Member

lemire commented Nov 5, 2024

@anonrig @mertcanaltin Where is this issue at? It has been opened for a long time.

Copy link
Member

@anonrig anonrig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving to run the CI

@anonrig anonrig added request-ci Add this label to start a Jenkins CI on a PR. and removed request-ci-failed An error occurred while starting CI via request-ci label, and manual interventon is needed. labels Nov 5, 2024
@github-actions github-actions bot removed the request-ci Add this label to start a Jenkins CI on a PR. label Nov 5, 2024
@nodejs-github-bot
Copy link
Collaborator

@nodejs-github-bot
Copy link
Collaborator

@nodejs-github-bot
Copy link
Collaborator

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c++ Issues and PRs that require attention from people who are familiar with C++. encoding Issues and PRs related to the TextEncoder and TextDecoder APIs. needs-ci PRs that need a full CI run. v8 engine Issues and PRs related to the V8 dependency.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants