Skip to content

Handle failure of lseek in fflush #23624

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 8, 2025
Merged

Conversation

sbc100
Copy link
Collaborator

@sbc100 sbc100 commented Feb 8, 2025

Unlike on linux (it seems) the emscripten lseek can fail for various reasons. For example, we do not support lseek on TTY or pipe file descriptors.

Musl, it seems, just assumes that the seek succeeds and adjusts its internal buffers accordingly. This PR update fflush such that it can handle the failure of lseek.

The alternative to this approach would be to add seek support to file our stdin implementation but that looks like a bigger change. Since it seems reasonable for stdout and stdin to not be seekable updating musl to handle this case seems like the right thing to do for now. If we find a other code in the wild that depends on stdin supporting lseek we can reconsider.

The following (14) test expectation files were updated by
running the tests with --rebaseline:

other/codesize/test_codesize_cxx_ctors1.size: 129141 => 129147 [+6 bytes / +0.00%]
other/codesize/test_codesize_cxx_ctors2.size: 128553 => 128559 [+6 bytes / +0.00%]
other/codesize/test_codesize_cxx_except.size: 170809 => 170815 [+6 bytes / +0.00%]
other/codesize/test_codesize_cxx_except_wasm.size: 144517 => 144523 [+6 bytes / +0.00%]
other/codesize/test_codesize_cxx_except_wasm_legacy.size: 142092 => 142098 [+6 bytes / +0.00%]
other/codesize/test_codesize_cxx_lto.size: 122093 => 122099 [+6 bytes / +0.00%]
other/codesize/test_codesize_cxx_mangle.size: 232619 => 232625 [+6 bytes / +0.00%]
other/codesize/test_codesize_cxx_noexcept.size: 131704 => 131710 [+6 bytes / +0.00%]
other/codesize/test_codesize_cxx_wasmfs.size: 169111 => 169117 [+6 bytes / +0.00%]
other/codesize/test_codesize_files_wasmfs.size: 50049 => 50055 [+6 bytes / +0.01%]
other/codesize/test_codesize_hello_O0.size: 15101 => 15105 [+4 bytes / +0.03%]
other/codesize/test_codesize_minimal_O0.size: 1156 => 1160 [+4 bytes / +0.35%]
other/test_unoptimized_code_size.wasm.size: 15101 => 15105 [+4 bytes / +0.03%]
other/test_unoptimized_code_size_strict.wasm.size: 15101 => 15105 [+4 bytes / +0.03%]
                                                                         
Average change: +0.03% (+0.00% - +0.35%)                                 

Fixes: #21791

@sbc100 sbc100 requested a review from kripken February 8, 2025 01:11
@sbc100 sbc100 force-pushed the fix_fflush branch 3 times, most recently from 1f6a65b to ac5130f Compare February 8, 2025 01:14
@sbc100 sbc100 requested a review from dschuff February 8, 2025 01:14
@sbc100 sbc100 force-pushed the fix_fflush branch 2 times, most recently from 8c9d747 to 3192254 Compare February 8, 2025 02:27
@hoodmane
Copy link
Collaborator

hoodmane commented Feb 8, 2025

What happens in musl when seeking stdout? I would think the right thing would be returning ENOSYS or ENOTSUP or something. Does it return success and do nothing?

@sbc100
Copy link
Collaborator Author

sbc100 commented Feb 8, 2025

What happens in musl when seeking stdout? I would think the right thing would be returning ENOSYS or ENOTSUP or something. Does it return success and do nothing?

The code here is only effect files that are being read-from not written-to. In linux it turns out that you can seek stdin backward just fine. I was actually surprised to discover this. I ran this test of alpine linux (which uses musl) and the stdin file handle is successfully "seeked" backwards by fflush.

Unlike on linux (it seems) the emscripten lseek can fail for various
reasons.  For example, we do not support lseek on TTY or pipe file
descriptors.

Musl, it seems, just assumes that the seek succeeds and adjusts its
internal buffers accordingly.

The following (14) test expectation files were updated by
running the tests with `--rebaseline`:

```
other/codesize/test_codesize_cxx_ctors1.size: 129141 => 129147 [+6 bytes / +0.00%]
other/codesize/test_codesize_cxx_ctors2.size: 128553 => 128559 [+6 bytes / +0.00%]
other/codesize/test_codesize_cxx_except.size: 170809 => 170815 [+6 bytes / +0.00%]
other/codesize/test_codesize_cxx_except_wasm.size: 144517 => 144523 [+6 bytes / +0.00%]
other/codesize/test_codesize_cxx_except_wasm_legacy.size: 142092 => 142098 [+6 bytes / +0.00%]
other/codesize/test_codesize_cxx_lto.size: 122093 => 122099 [+6 bytes / +0.00%]
other/codesize/test_codesize_cxx_mangle.size: 232619 => 232625 [+6 bytes / +0.00%]
other/codesize/test_codesize_cxx_noexcept.size: 131704 => 131710 [+6 bytes / +0.00%]
other/codesize/test_codesize_cxx_wasmfs.size: 169111 => 169117 [+6 bytes / +0.00%]
other/codesize/test_codesize_files_wasmfs.size: 50049 => 50055 [+6 bytes / +0.01%]
other/codesize/test_codesize_hello_O0.size: 15101 => 15105 [+4 bytes / +0.03%]
other/codesize/test_codesize_minimal_O0.size: 1156 => 1160 [+4 bytes / +0.35%]
other/test_unoptimized_code_size.wasm.size: 15101 => 15105 [+4 bytes / +0.03%]
other/test_unoptimized_code_size_strict.wasm.size: 15101 => 15105 [+4 bytes / +0.03%]

Average change: +0.03% (+0.00% - +0.35%)
```
@sbc100
Copy link
Collaborator Author

sbc100 commented Feb 8, 2025

See #23627 for more the universal_newlines vs text thing. I'll probably let that change land first.

@hoodmane
Copy link
Collaborator

hoodmane commented Feb 8, 2025

the stdin file handle is successfully "seeked" backwards by fflush.

Should we consider copying that behavior for the stdin device instead of making this change? Since otherwise there is an observable difference in the behavior when seeking stdin.

@sbc100
Copy link
Collaborator Author

sbc100 commented Feb 8, 2025

the stdin file handle is successfully "seeked" backwards by fflush.

Should we consider copying that behavior for the stdin device instead of making this change? Since otherwise there is an observable difference in the behavior when seeking stdin.

Can you explain what you mean exactly?

Are you suggesting that we update emscripten to allow stdin to support seeking? I think that would be much bigger change and without much benefit (as I mentioned in #23624 (comment)). I also think its fairly reasonable that stdin and stdout don't support seeking, no?

Here is the code for stdin handling BTW:

$FS_stdin_getChar_buffer: [],
// getChar has 3 particular return values:
// a.) the next character represented as an integer
// b.) undefined to signal that no data is currently available
// c.) null to signal an EOF
$FS_stdin_getChar__deps: [
'$FS_stdin_getChar_buffer',
'$intArrayFromString',
],
$FS_stdin_getChar: () => {
if (!FS_stdin_getChar_buffer.length) {
var result = null;
#if ENVIRONMENT_MAY_BE_NODE
if (ENVIRONMENT_IS_NODE) {
// we will read data by chunks of BUFSIZE
var BUFSIZE = 256;
var buf = Buffer.alloc(BUFSIZE);
var bytesRead = 0;
// For some reason we must suppress a closure warning here, even though
// fd definitely exists on process.stdin, and is even the proper way to
// get the fd of stdin,
// https://github.com/nodejs/help/issues/2136#issuecomment-523649904
// This started to happen after moving this logic out of library_tty.js,
// so it is related to the surrounding code in some unclear manner.
/** @suppress {missingProperties} */
var fd = process.stdin.fd;
try {
bytesRead = fs.readSync(fd, buf, 0, BUFSIZE);
} catch(e) {
// Cross-platform differences: on Windows, reading EOF throws an
// exception, but on other OSes, reading EOF returns 0. Uniformize
// behavior by treating the EOF exception to return 0.
if (e.toString().includes('EOF')) bytesRead = 0;
else throw e;
}
if (bytesRead > 0) {
result = buf.slice(0, bytesRead).toString('utf-8');
}
} else
#endif
#if ENVIRONMENT_MAY_BE_WEB
if (typeof window != 'undefined' &&
typeof window.prompt == 'function') {
// Browser.
result = window.prompt('Input: '); // returns null on cancel
if (result !== null) {
result += '\n';
}
} else
#endif
#if ENVIRONMENT_MAY_BE_SHELL
if (typeof readline == 'function') {
// Command line.
result = readline();
if (result) {
result += '\n';
}
} else
#endif
{}
if (!result) {
return null;
}
FS_stdin_getChar_buffer = intArrayFromString(result, true);
}
return FS_stdin_getChar_buffer.shift();
},
. I did take a look to see if we could enable seeking without too much trouble, but it looked like it would be relatively complex change.. without much benefit that I can see.

@hoodmane
Copy link
Collaborator

hoodmane commented Feb 8, 2025

I also think its fairly reasonable that stdin and stdout don't support seeking, no?

I agree that they shouldn't. But I think if linux does something unreasonable we should try to do the same unreasonable thing whenever possible.

@hoodmane
Copy link
Collaborator

hoodmane commented Feb 8, 2025

But yeah if it's more effort than it's worth that is fair.

@hoodmane
Copy link
Collaborator

hoodmane commented Feb 8, 2025

I think this would be enough to make stdin seekable in the JS fs:

--- a/src/lib/libfs.js
+++ b/src/lib/libfs.js
@@ -1493,7 +1493,7 @@ FS.staticInit();
       // have been overwritten we create a unique device for
       // them instead.
       if (input) {
-        FS.createDevice('/dev', 'stdin', input);
+        FS.createDevice('/dev', 'stdin', input, null, {seekable: true});
       } else {
         FS.symlink('/dev/tty', '/dev/stdin');
       }
@@ -1659,7 +1659,7 @@ FS.staticInit();
         FS.chmod(node, mode);
       }
     },
-    createDevice(parent, name, input, output) {
+    createDevice(parent, name, input, output, opts) {
       var path = PATH.join2(typeof parent == 'string' ? parent : FS.getPath(parent), name);
       var mode = FS_getMode(!!input, !!output);
       FS.createDevice.major ??= 64;
@@ -1668,7 +1668,7 @@ FS.staticInit();
       // the old behavior.
       FS.registerDevice(dev, {
         open(stream) {
-          stream.seekable = false;
+          stream.seekable = opts.seekable ?? false;
         },
         close(stream) {
           // flush any pending line data

@hoodmane
Copy link
Collaborator

hoodmane commented Feb 8, 2025

But maybe that will cause a hard crash when you try to seek it, I didn't try it =)

@sbc100
Copy link
Collaborator Author

sbc100 commented Feb 8, 2025

I also think its fairly reasonable that stdin and stdout don't support seeking, no?

I agree that they shouldn't. But I think if linux does something unreasonable we should try to do the same unreasonable thing whenever possible.

I don't think emscripten should go out of its way to emulate every linux corner case like this. At least not until we have evidence that real world programs depend on them. Our libc for example, even though its based on musl, lacks most linux-specific stuff. We can make exceptions if we find a lot of real world code the needs it but in general we don't support linux-specific stuff.

@sbc100
Copy link
Collaborator Author

sbc100 commented Feb 8, 2025

I think this would be enough to make stdin seekable in the JS fs:

Its more complex than that I fear. See

read(stream, buffer, offset, length, pos /* ignored */) {
if (!stream.tty || !stream.tty.ops.get_char) {
throw new FS.ErrnoError({{{ cDefs.ENXIO }}});
}
var bytesRead = 0;
for (var i = 0; i < length; i++) {
var result;
try {
result = stream.tty.ops.get_char(stream.tty);
} catch (e) {
throw new FS.ErrnoError({{{ cDefs.EIO }}});
}
if (result === undefined && bytesRead === 0) {
throw new FS.ErrnoError({{{ cDefs.EAGAIN }}});
}
if (result === null || result === undefined) break;
bytesRead++;
buffer[offset+i] = result;
}
if (bytesRead) {
stream.node.atime = Date.now();
}
return bytesRead;
},
. We would need to implement some kind of extra buffering layer, or change the way get_char works here.

@hoodmane
Copy link
Collaborator

hoodmane commented Feb 8, 2025

Okay I think I finally understand. You're saying that stdin is saving recent input so that we can seek back a little bit and replay the recent bytes? That indeed sounds too complicated to bother with.

@sbc100
Copy link
Collaborator Author

sbc100 commented Feb 8, 2025

Okay I think I finally understand. You're saying that stdin is saving recent input so that we can seek back a little bit and replay the recent bytes? That indeed sounds too complicated to bother with.

The linux stdin fd certainly seems to be doing that yes. At least in my tests on alpine linux. I'm not sure why.

Copy link
Collaborator

@hoodmane hoodmane left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation @sbc100. I think this makes sense to me now. It might be helpful if you could add a summary of your answers to my questions to the comment.

@sbc100 sbc100 merged commit babbcc8 into emscripten-core:main Feb 8, 2025
29 checks passed
@sbc100 sbc100 deleted the fix_fflush branch February 8, 2025 18:22
@sbc100
Copy link
Collaborator Author

sbc100 commented Feb 8, 2025

I updated the comment. Thanks for the review @hoodmane, and thanks for all your recent contributions.

@sbc100
Copy link
Collaborator Author

sbc100 commented Feb 10, 2025

Actually I turns out that calling fflush on stdin is undefined behaviour: https://en.cppreference.com/w/c/io/fflush: "For input streams (and for update streams on which the last operation was input), the behavior is undefined.".

Should we revert this?

@kripken
Copy link
Member

kripken commented Feb 10, 2025

I would lean towards reverting. This adds code size and increases the diff vs upstream musl, and in return for something very minor.

sbc100 added a commit that referenced this pull request Feb 10, 2025
sbc100 added a commit that referenced this pull request Feb 10, 2025
Reverts #23624

This turned out to be a fix for something that is undefined behaviour.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

When I use fflush(stdin), the output is wrong. Is it a bug?
3 participants