Rewrite gzip._GzipReader in C for more performance and less overhead #110283
Labels
performance
Performance or resource usage
stdlib
Python modules in the Lib dir
type-feature
A feature request or enhancement
Feature or enhancement
Proposal:
gzip._GzipReader is used as a raw IO for a io.BufferedReader in GzipFile. By rewriting it in C the following things can be achieved:
Problems with this approach:
The performance gains can be quite substantial: see this PR for python-isal: pycompression/python-isal#151 . EDIT: I also made the _GzipReader accept buffer protocol objects and use the internal buffer as read buffer: pycompression/python-isal#152 . As a result the overhead of reading a bytes-like object is now minimal, requiring now io.BytesIO object. This has massively reduced overhead.
The code from python-isal can be copied almost verbatim into CPython is they share the same license. (I gave python-isal the same license to make code exchange easy.) The code can be battle-tested a bit in python-isal before it is released into cpython.
It will also make it easier to implement the gzip header reading implemented in zlib as mentioned in #103477 .
Code for checking the header CRC is already implemented, this would make Python's gzip reader more spec compliant. #89672
Ping @gpshead, is this something that could be integrated into CPython, or would this create too many backwards-compatibility headaches?
Has this already been discussed elsewhere?
This is a minor feature, which does not need previous discussion elsewhere
Links to previous discussion of this feature:
No response
The text was updated successfully, but these errors were encountered: