Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fs: remove chunk by chunk file read #44276

Closed
wants to merge 1 commit into from

Conversation

anonrig
Copy link
Member

@anonrig anonrig commented Aug 18, 2022

I'm still investigating why the performance is down by 20% for certain cases, but 68% performance looks like a good start.

I left the chunk-by-chunk implementation to support edge cases where the file size is unknown before reading it, where fstat returns 0.

fs/readfile-partitioned.js concurrent=1 len=1024 dur=1                        0.48 %       ±2.40% ±3.20% ±4.17%
fs/readfile-partitioned.js concurrent=1 len=16777216 dur=1           ***     19.44 %       ±4.43% ±5.91% ±7.74%
fs/readfile-partitioned.js concurrent=10 len=1024 dur=1                       1.24 %       ±2.71% ±3.60% ±4.69%
fs/readfile-partitioned.js concurrent=10 len=16777216 dur=1          ***    -68.32 %       ±1.90% ±2.55% ±3.37%
fs/readfile-promises.js concurrent=1 len=1024 duration=1                      2.00 %       ±2.88% ±3.84% ±5.00%
fs/readfile-promises.js concurrent=1 len=16777216 duration=1         ***     23.65 %       ±2.00% ±2.68% ±3.52%
fs/readfile-promises.js concurrent=10 len=1024 duration=1                    -0.51 %       ±0.55% ±0.74% ±0.96%
fs/readfile-promises.js concurrent=10 len=16777216 duration=1        ***    -16.45 %       ±3.41% ±4.53% ±5.90%
fs/readfile.js concurrent=1 len=1024 duration=1                              -2.52 %       ±3.28% ±4.37% ±5.69%
fs/readfile.js concurrent=1 len=16777216 duration=1                  ***     21.26 %       ±1.81% ±2.41% ±3.14%
fs/readfile.js concurrent=10 len=1024 duration=1                             -0.56 %       ±1.26% ±1.68% ±2.21%
fs/readfile.js concurrent=10 len=16777216 duration=1                 ***    -19.66 %       ±2.37% ±3.18% ±4.17%

@nodejs-github-bot nodejs-github-bot added fs Issues and PRs related to the fs subsystem / file system. needs-ci PRs that need a full CI run. labels Aug 18, 2022
@anonrig
Copy link
Member Author

anonrig commented Aug 18, 2022

@mcollina @RafaelGSS

@mcollina
Copy link
Member

It seems this is not improving things in all cases, so maybe there is more work to do on this one.

@mcollina mcollina requested a review from bnoordhuis August 18, 2022 14:05
Copy link
Member

@BridgeAR BridgeAR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The upper bound is there for partitioning the reads. For further information check #25741 and #17054.
We should not remove that.

Please also check the comment right above the upper bound: https://github.com/nodejs/node/pull/44276/files#diff-5cd422a9a64bc2c1275d70ba5dcafbc5ea55274432fffc5e4159126007dc4894L137-L138

@anonrig
Copy link
Member Author

anonrig commented Aug 18, 2022

Thanks @BridgeAR for the links. It was really helpful towards the decisions that lead to this discussion.

So, removing chunks blocks the IO. Not-removing chunks increase execution speed.

Since we don't want to do that, but we also have to reduce the number of calls between JS and C++, there are not many options here. If we're going to leave the chunks with 512kb, the only optimization we can do at this point is to move open, fstat, close functionality inside fs/read c++ implementation to reduce the communication. Even after that, the number of calls between JS and C++ will be directly correlated by file_size / 512.

I'll be happy to dive into the correct path here if anyone would be kind to help, but afaik if we want to improve the performance, we need to pick our battles.

Here's my review of the current flow for fs.readFile: #44239

@mcollina
Copy link
Member

I think the use case to optimize is reading 'utf8' text. This is exceptionally slow and uses 3x the amount of memory of the file size.

Since we don't want to do that, but we also have to reduce the number of calls between JS and C++, there are not many options here. If we're going to leave the chunks with 512kb, the only optimization we can do at this point is to move open, fstat, close functionality inside fs/read c++ implementation to reduce the communication. Even after that, the number of calls between JS and C++ will be directly correlated by file_size / 512.

I think you can move the accumulation of bytes in C++ and avoid calling JS at all.

@anonrig
Copy link
Member Author

anonrig commented Aug 18, 2022

Thanks @mcollina. I created another pull request to update the benchmarks to include encoding. #44278 I'm going to close this pull request for now.

@anonrig anonrig closed this Aug 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fs Issues and PRs related to the fs subsystem / file system. needs-ci PRs that need a full CI run.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants