Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decompressing concatenated gzip #7157

Open
1 task done
davenza opened this issue Jan 3, 2023 · 3 comments
Open
1 task done

Decompressing concatenated gzip #7157

davenza opened this issue Jan 3, 2023 · 3 comments
Assignees
Labels

Comments

@davenza
Copy link

davenza commented Jan 3, 2023

Describe the bug

Recently, I have been consuming a proprietary REST API that returns gzip-encoded text data.

I discovered that the response was different from the response of the library requests or Postman. After a long research, I discovered that the REST API is sending concatenated gzip data.

It seems that decompressing concatenated gzip data requires special treatment of zlib.decompressobj.unused_data (for example, check this answer on StackOverflow).

I have confirmed that a similar implementation is found in urllib3 (which is used by the library requests), where unused data is checked for further decompression.

I think that this is where aiohttp decompresses the gzip data. The unused_data is not handled in any way. I tried changing that line of aiohttp code to something like this:

ret = self.decompressor.decompress(chunk)
while self.decompressor.unused_data:
    chunk = self.decompressor.unused_data
    self.decompressor = zlib.decompressobj(zlib.MAX_WBITS | 16)
    ret += self.decompressor.decompress(chunk)

chunk = ret

and I was able to reproduce the response of the library requests.

I have labeled this issue as bug, since I believe that the desired behavior is the same as in the library requests. If this treatment of gzip data was intentional, is there a simple way to process concatenated gzip data? Currently, aiohttp only returns a fragment of the decompressed response with await response.text().

To Reproduce

Sorry, I cannot offer a way to reproduce because I am using a propietary REST API.

Expected behavior

All the concatenated gzip data should be decompressed and concatenated as requests does here.

Logs/tracebacks

No logs.

Python Version

Python 3.8.10

aiohttp Version

aiohttp 3.8.3

multidict Version

multidict 6.0.4

yarl Version

yarl 1.8.2

OS

Windows 10

Related component

Client

Additional context

No response

Code of Conduct

  • I agree to follow the aio-libs Code of Conduct
@davenza davenza added the bug label Jan 3, 2023
@Dreamsorcerer
Copy link
Member

If you can create a test and include this fix in a PR, then we can look at merging it.
Unless there's some drawback I'm missing, I doubt it was intentional.

@Dreamsorcerer
Copy link
Member

We need an example of concatenated gzip data to reproduce this...

@Dreamsorcerer
Copy link
Member

@steverep Not sure if this is something that might interest you, given your previous work on compressors.

@steverep steverep self-assigned this Sep 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants