Closed
Description
Bug report
The change to add buffering to GzipFile.write
(gh-89550, #101251) broke the GzipFile.flush
method. The flush method previously called self.compress.flush
, but now it only flushes the IO objects and not the compressor (as a side effect, the zlib_mode argument is now ignored, although in my case I only use the default). Flushing the compressor is necessary to create synchronization points that can be used to decompress part of the stream.
Here is a test script, reduced from Tornado's use of GzipFile (see tornadoweb/tornado#3278 for the way this manifests in Tornado's test suite):
import io
import gzip
import zlib
# Write two chunks to the same compressed stream. In real usage
# I send these chunks as two separate network messages, but in this
# test I just save them to two local variables.
data = io.BytesIO()
gzip_file = gzip.GzipFile(fileobj=data, mode="wb")
gzip_file.write(b"Hello World")
gzip_file.flush()
message1 = data.getvalue()
data.truncate(0)
data.seek(0)
gzip_file.write(b"Goodbye World")
gzip_file.close()
message2 = data.getvalue()
# Decode the two messages. Each one should decode separately,
# but in Python 3.12b2 the compressor was not flushed with
# Z_SYNC_FLUSH so the second message produces no output on its own
# and both messages are emitted when the second message is added
# to the decompressor's input.
#
# This results in the error
# AssertionError: [b'', b'Hello WorldGoodbye World']
decompressor = zlib.decompressobj(16 + zlib.MAX_WBITS)
messages = [decompressor.decompress(message1), decompressor.decompress(message2)]
assert messages == [b"Hello World", b"Goodbye World"], messages
Your environment
- CPython versions tested on: The bug is present in 3.12b2; the above script passes on 3.11 and earlier
- Operating system and architecture: macOS and Linux
Linked PRs
Metadata
Metadata
Assignees
Projects
Status
Done