-
Notifications
You must be signed in to change notification settings - Fork 16
xopen accepts filehandles #152
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@geertvandeweyer It is finally done! |
my apologies for the rabbit hole :-) and many thanks for this ! |
@marcelm Friendly reminder ping |
Co-authored-by: Marcel Martin <marcel.martin@scilifelab.se>
d5c0fcb
to
da5f9ea
Compare
@marcelm All done. Thank you for the review! |
Supersedes #150, ping @geertvandeweyer
This change was quite hard. I ended up trying multiple paths before I finally was able to tackle the problem.
The difficulty is to ensure that all files are properly closed when using a context manager. Hence it is not possible to simply create a binary stream and pass that to all subfunctions. Since python's builtin
gzip
,bzip2
andxz
modules properly handle file-like objects and paths in their API, the best solution turned out to be to mimick these APIs internally for_PipedCompressionProgram
,_open_reproducible_gzip
and_check_format_from_content
.I started with extending the tests to include BytesIO objects as inputs as well.
Then I managed to make PipedCompressionProgram also accept file-like objects for reading. The reason this does not work naively is that subprocess.Popen needs a filedescriptor to read from on its stdin. Hence it does not work with streams that only live in Python memory. To alleviate this, a secondary thread is opened that reads the filestream and writes it to the stdin of the process. The disadvantage of this system is that it creates overhead, the advantage of this is that the Python process is actually reading the file, rather than leaving the opening of it to the external program.
I created a helper function to convert filepaths and filelike objects to binary IO objects that can be used internally.