Decompressing concatenated gzip #7157

davenza · 2023-01-03T19:17:27Z

Describe the bug

Recently, I have been consuming a proprietary REST API that returns gzip-encoded text data.

I discovered that the response was different from the response of the library requests or Postman. After a long research, I discovered that the REST API is sending concatenated gzip data.

It seems that decompressing concatenated gzip data requires special treatment of zlib.decompressobj.unused_data (for example, check this answer on StackOverflow).

I have confirmed that a similar implementation is found in urllib3 (which is used by the library requests), where unused data is checked for further decompression.

I think that this is where aiohttp decompresses the gzip data. The unused_data is not handled in any way. I tried changing that line of aiohttp code to something like this:

ret = self.decompressor.decompress(chunk)
while self.decompressor.unused_data:
    chunk = self.decompressor.unused_data
    self.decompressor = zlib.decompressobj(zlib.MAX_WBITS | 16)
    ret += self.decompressor.decompress(chunk)

chunk = ret

and I was able to reproduce the response of the library requests.

I have labeled this issue as bug, since I believe that the desired behavior is the same as in the library requests. If this treatment of gzip data was intentional, is there a simple way to process concatenated gzip data? Currently, aiohttp only returns a fragment of the decompressed response with await response.text().

To Reproduce

Sorry, I cannot offer a way to reproduce because I am using a propietary REST API.

Expected behavior

All the concatenated gzip data should be decompressed and concatenated as requests does here.

Logs/tracebacks

No logs.

Python Version

Python 3.8.10

aiohttp Version

aiohttp 3.8.3

multidict Version

multidict 6.0.4

yarl Version

yarl 1.8.2

OS

Windows 10

Related component

Client

Additional context

No response

Code of Conduct

I agree to follow the aio-libs Code of Conduct

The text was updated successfully, but these errors were encountered:

Dreamsorcerer · 2023-01-03T19:28:22Z

If you can create a test and include this fix in a PR, then we can look at merging it.
Unless there's some drawback I'm missing, I doubt it was intentional.

Dreamsorcerer · 2024-09-01T15:17:19Z

We need an example of concatenated gzip data to reproduce this...

Dreamsorcerer · 2024-09-01T15:19:11Z

@steverep Not sure if this is something that might interest you, given your previous work on compressors.

davenza added the bug label Jan 3, 2023

steverep self-assigned this Sep 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Decompressing concatenated gzip #7157

Decompressing concatenated gzip #7157

davenza commented Jan 3, 2023

Dreamsorcerer commented Jan 3, 2023

Dreamsorcerer commented Sep 1, 2024

Dreamsorcerer commented Sep 1, 2024

Decompressing concatenated gzip #7157

Decompressing concatenated gzip #7157

Comments

davenza commented Jan 3, 2023

Describe the bug

To Reproduce

Expected behavior

Logs/tracebacks

Python Version

aiohttp Version

multidict Version

yarl Version

OS

Related component

Additional context

Code of Conduct

Dreamsorcerer commented Jan 3, 2023

Dreamsorcerer commented Sep 1, 2024

Dreamsorcerer commented Sep 1, 2024