Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fs.readFileSync returns incorrect buffer when target file is very small. #35351

Closed
jaburns opened this issue Sep 25, 2020 · 5 comments
Closed
Labels
buffer Issues and PRs related to the buffer subsystem.

Comments

@jaburns
Copy link

jaburns commented Sep 25, 2020

  • Version: 12.18.4
  • Platform: Linux 5.4.0-48-generic io.js on The Changelog! #52~18.04.1-Ubuntu SMP Thu Sep 10 12:50:22 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
  • Subsystem: fs

What steps will reproduce the bug?

Create a small binary file, and then attempt to read it using fs.readFileSync. Here's a couple quick shell lines to reproduce the issue:

echo '01020304010203040102030401020304' | xxd -r -p > file.bin
echo "console.log(require('fs').readFileSync('file.bin').buffer)" | node

The expected output is of course something like 01 02 03 04 01 02 03 04 ... etc. But instead you see a buffer that's 8192 bytes long and contains some other junk like loaded source file contents. The binary file appears to be loaded somewhere in that 8k block, but it's not at the beginning, and it's not just that file.

How often does it reproduce? Is there a required condition?

Reproduces every time I try. It only happens with files smaller than a few kB. I haven't narrowed it down to the exact threshold size.

What is the expected behavior?

The file is loaded correctly and this is printed:

<Buffer 01 02 03 04 01 02 03 04 01 02 03 04 01 02 03 04>

What do you see instead?

Random junk around that file. i.e.

ArrayBuffer {
  [Uint8Contents]: <2f 7c 49 75 0f 7f 00 00 2f 12 20 04 00 00 00 00 01 02 03 04 01 02 03 04 01 02 03 04 01 02 03 04 01 00 00 00 00 00 00 00 81 6b 09 3d d3 32 00 00 00 00 00 00 00 00 00 00 d1 e2 2c 0a d3 32 00 00 21 e3 2c 0a d3 32 00 00 88 d0 1b 04 00 00 00 00 01 00 00 00 00 00 00 00 83 03 4b 34 5a 13 00 00 00 00 00 00 ... 8092 more bytes>,
  byteLength: 8192
}

Additional information

This is not an issue with readFile only readFileSync. With the same test file readFile seems to work fine.

@codebytere
Copy link
Member

codebytere commented Sep 25, 2020

I see this on latest master as well:

node on git:master ❯ echo "console.log(require('fs').readFileSync('file.bin').buffer)" | out/Release/node
ArrayBuffer {
  [Uint8Contents]: <2f 00 00 00 00 00 00 00 2f 00 00 00 00 00 00 00 01 02 03 04 01 02 03 04 01 02 03 04 01 02 03 04 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ... 8092 more bytes>,
  byteLength: 8192
}

However, are you sure that you're calling what you expect? Given that readFileSync itself returns a buffer, running:

echo '01020304010203040102030401020304' | xxd -r -p > file.bin
echo "console.log(require('fs').readFileSync('file.bin'))" | node

correctly produces: <Buffer 01 02 03 04 01 02 03 04 01 02 03 04 01 02 03 04>.

@BridgeAR
Copy link
Member

This is likely working as intended. It's using the buffer pool to improve the overall performance. It's not really intended to access the underlying buffer. We could use alloc but it'll have a performance impact.

@watilde watilde added the fs Issues and PRs related to the fs subsystem / file system. label Sep 26, 2020
@jaburns
Copy link
Author

jaburns commented Sep 26, 2020

Thanks for the context, that makes sense. Let me clarify my use case a bit to shine some light on why I thought this might be a bug.

I didn't write out the expected value correctly, I do want an ArrayBuffer, not a Buffer. The reason I'm grabbing the underlying buffer is because I have some code that's shared between client and server that consumes ArrayBuffers, and on the server I want to get the file contents in to an ArrayBuffer to pass to this code.

So maybe it's working as expected, which is fine, but it's still a bit surprising. Given that there is a use case for wanting to get the data as an ArrayBuffer, it seems like just getting .buffer should be what you want. Maybe a note in the documentation about this behavior would be enough?

@bnoordhuis
Copy link
Member

The notes in buffer.md on the buf.buffer and buf.byteOffset properties do mention this caveat and how to handle it so I'm inclined to say this is both working as expected and as documented. The short version:

b = fs.readFileSync(filename)
b = b.buffer.slice(b.byteOffset, b.byteOffset + b.byteLength)

I'll close this out but please open a pull request if you have suggestions for improvements.

@bnoordhuis bnoordhuis added buffer Issues and PRs related to the buffer subsystem. and removed fs Issues and PRs related to the fs subsystem / file system. labels Sep 27, 2020
@jaburns
Copy link
Author

jaburns commented Sep 27, 2020

Sounds good, thanks! I guess I should have checked the documentation more carefully :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
buffer Issues and PRs related to the buffer subsystem.
Projects
None yet
Development

No branches or pull requests

6 participants
@bnoordhuis @jaburns @watilde @codebytere @BridgeAR and others