Skip to content

task.c: fflush before fork to avoid duplicated output buffers (fix: #67) #68

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

oreo639
Copy link

@oreo639 oreo639 commented Mar 24, 2025

Not entirely sure how to make a test for this with the python bindings.

When a process is forked, all of the memory is duplicated in the new process, this also includes the FILE* buffers, which can result in those buffers being flushed multiple times.

To prevent this, output buffers can be flushed before forking.

One potential issue with this approach though, fflush(NULL) can interfere with calls to ungetc(), in which case fflush(NULL) will result in that being undone. (although on linux, FreeBSD, Windows, and probably others fflush(NULL) is defined as only applying to output streams, so it wouldn't affect ungetc() on stdin or streams opened in read mode)

@rrthomas
Copy link
Owner

Thanks very much for this!

What's the reason for flushing all streams here, and not just the ones we write to? If we can just flush those (in fact, just the one output stream), then we should be safe, since we never ungetc in recode? The child process is only in the context of librecode itself (it's the parent that deals with enca), so we shouldn't need to worry about what enca, or any other caller, does.

@oreo639
Copy link
Author

oreo639 commented Mar 24, 2025

The child process is only in the context of librecode itself (it's the parent that deals with enca), so we shouldn't need to worry about what enca, or any other caller, does.

My concern was that process exit also flushes the streams.

e.g. if you do a printf before calling recode and that printf is buffered, then the output will be duplicated when both processes exit:

#include <stdio.h>
#include <unistd.h>

int main() {
	printf("Hello world");
	fork();
	return 0;
}

(Outputs Hello worldHello world)

If you do:

#include <stdio.h>
#include <unistd.h>

int main() {
	FILE *fp = fopen("hello.txt", "w");
	fprintf(fp, "Hello world\n");
	fork();
	return 0;
}

The output is also duplicated even though fclose isn't explicitly called.

If that is something consumers just need to be wary of, then I can make it only flush the output streams passed into and created by recode. (and maybe stdout?)

@rrthomas
Copy link
Owner

Thanks for the explanation. So it seems that the options are:

  • Say "this API may fork, so flush any streams you pass to it before calling it"
  • Say "this API may fork, and if it does it will flush all output streams"
  • Don't fork.

I have to say, I'm tempted to take the last option. In theory, of course, forking can improve performance, but this is quite an annoying potential gotcha.

I looked into closing all file handles we don't care about, but I don't think that will work either, as we can only identify and close them at the file descriptor level, which will presumably just confuse libc and cause a crash. To do this cleanly librecode would have to work with file descriptors rather than streams, and then it wouldn't have the buffering problem in the first place.

@rrthomas
Copy link
Owner

Remove the pipe code path in preference to attempting to fix this issue, which as outlined above has subtle and annoying implications for the caller that are basically impossible to fix.

@rrthomas rrthomas closed this Mar 25, 2025
@oreo639 oreo639 deleted the fork branch March 25, 2025 21:20
@oreo639
Copy link
Author

oreo639 commented Mar 25, 2025

Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants