Skip to content

xopen accepts filehandles #152

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 33 commits into from
Mar 19, 2024
Merged

xopen accepts filehandles #152

merged 33 commits into from
Mar 19, 2024

Conversation

rhpvorderman
Copy link
Collaborator

@rhpvorderman rhpvorderman commented Mar 11, 2024

Supersedes #150, ping @geertvandeweyer

This change was quite hard. I ended up trying multiple paths before I finally was able to tackle the problem.

The difficulty is to ensure that all files are properly closed when using a context manager. Hence it is not possible to simply create a binary stream and pass that to all subfunctions. Since python's builtin gzip, bzip2 and xz modules properly handle file-like objects and paths in their API, the best solution turned out to be to mimick these APIs internally for _PipedCompressionProgram, _open_reproducible_gzip and _check_format_from_content.

I started with extending the tests to include BytesIO objects as inputs as well.

Then I managed to make PipedCompressionProgram also accept file-like objects for reading. The reason this does not work naively is that subprocess.Popen needs a filedescriptor to read from on its stdin. Hence it does not work with streams that only live in Python memory. To alleviate this, a secondary thread is opened that reads the filestream and writes it to the stdin of the process. The disadvantage of this system is that it creates overhead, the advantage of this is that the Python process is actually reading the file, rather than leaving the opening of it to the external program.

I created a helper function to convert filepaths and filelike objects to binary IO objects that can be used internally.

@rhpvorderman rhpvorderman marked this pull request as ready for review March 11, 2024 13:52
@rhpvorderman
Copy link
Collaborator Author

@geertvandeweyer It is finally done!
@marcelm Due to the enormous effort, the commits do not neatly follow how commits should ideally be done. Solving one issue created ResourceWarnings elsewhere and so forth. So I just committed when I got a step further on the ladder, in order to be able to deconstruct changes step by step. This actually saved me quite a few times. To remedy this, I suggest that this whole PR is squashed. Despite having spent 2+ days on tackling this the overall change in lines of code is quite small. I also had to fix python-isal and python-zlib-ng to enable this PR.

@geertvandeweyer
Copy link
Contributor

my apologies for the rabbit hole :-)

and many thanks for this !

@rhpvorderman
Copy link
Collaborator Author

@marcelm Friendly reminder ping

@rhpvorderman
Copy link
Collaborator Author

@marcelm All done. Thank you for the review!

@marcelm marcelm merged commit 62353e7 into main Mar 19, 2024
@marcelm marcelm deleted the acceptfilehandles branch March 19, 2024 15:23
@marcelm marcelm mentioned this pull request Jun 10, 2025
8 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants