-
-
Notifications
You must be signed in to change notification settings - Fork 31.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bpo-25711: Rewrite zipimport in pure Python. #6809
Conversation
fcf8f37
to
fb93cdd
Compare
Lib/zipimport.py
Outdated
|
||
path = _get_module_path(self, fullname) | ||
if mi: | ||
fullpath = path + path_sep + '__init__.py' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bootstrap_external._path_join()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the benefit of using bootstrap_external._path_join()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Readability. When I read path + path_sep + '__init__.py'
I have to mentally piece together that this is constructing a file path, while _path_join(path, '__init__.py
)` gives me that context implicitly.
Lib/zipimport.py
Outdated
if mi: | ||
fullpath = path + path_sep + '__init__.py' | ||
else: | ||
fullpath = path + '.py' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might as well use an f-string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the benefit of using an f-string instead of +
for concatenating two strings?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find it more readable.
Lib/zipimport.py
Outdated
# add __path__ to the module *before* the code gets | ||
# executed | ||
path = _get_module_path(self, fullname) | ||
fullpath = f'{self.archive}{path_sep}{path}' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_bootstrap_external._path_join()
Lib/zipimport.py
Outdated
|
||
# implementation | ||
|
||
def _unpack_uint32(data): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_bootstrap_external._r_long()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What to do with _unpack_uint16
? _unpack_uint32
looks to me more descriptive than _r_long
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can keep _unpack_uint16
, I just don't think we need to have two implementations to do the exact same thing of reading in the bytes of a little-endian 4 bytes int. If you don't like the name then feel free to rename it in importlib.
Lib/zipimport.py
Outdated
name = name.decode('latin1').translate(cp437_table) | ||
|
||
name = name.replace('/', path_sep) | ||
path = f'{archive}{path_sep}{name}' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bootstrap_external._path_join()
Lib/zipimport.py
Outdated
|
||
def _unpack_uint16(data): | ||
"""Convert 2 bytes in little-endian to an integer.""" | ||
assert len(data) == 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Wouldn't be better if we raised ValueError
here and in _unpack_uint32()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In all cases the length of the data is checked before calling _unpack_uint*()
. The assertion condition is always true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comments just based on reading the code. I'm going to test this branch with importlib_resources (standalone) next.
Lib/zipimport.py
Outdated
@@ -0,0 +1,640 @@ | |||
'''zipimport provides support for importing Python modules from Zip archives. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style nit: this should be a """
string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Lib/zipimport.py
Outdated
try: | ||
raw_data = fp.read(data_size) | ||
except OSError: | ||
raise OSError("zipimport: can't read data") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this just here to change the error message? I guess that's for compatibility with the message raised by the C implementation? I'm not sure how useful that is, and it does kind of hide the original exception (which might be useful for debugging). My inclination is to preserve the original OSError
rather than produce one with a different message, but if you feel otherwise, why not use a raise from
here too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed the try block.
Lib/zipimport.py
Outdated
try: | ||
decompress = _get_decompress_func() | ||
except: | ||
raise ZipImportError("can't decompress data; zlib not available") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
except Exception
and raise from
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added except Exception
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The branch does work with importlib_resources AFAICT.
When you're done making the requested changes, leave the comment: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your review @warsaw. I have addressed most of your comments. But AFAIK from error
is not needed in the Python code. Exceptions is chained by default. You need to add from None
for suppressing chaining.
Lib/zipimport.py
Outdated
'PQRSTUVWXYZ[\\]^_' | ||
'`abcdefghijklmno' | ||
'pqrstuvwxyz{|}~\x7f' | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for readability. The above part is ASCII, and 16 chars per row, the below part is non-ASCII, and 8 chars per row. What comment should I add?
Lib/zipimport.py
Outdated
from zlib import decompress | ||
except: | ||
_bootstrap._verbose_message('zipimport: zlib UNAVAILABLE') | ||
raise ZipImportError("can't decompress data; zlib not available") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The C implementation catches all exceptions. I agree, that this is evil and will change this.
Lib/zipimport.py
Outdated
try: | ||
raw_data = fp.read(data_size) | ||
except OSError: | ||
raise OSError("zipimport: can't read data") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed the try block.
Lib/zipimport.py
Outdated
try: | ||
decompress = _get_decompress_func() | ||
except: | ||
raise ZipImportError("can't decompress data; zlib not available") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added except Exception
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer to make behavior changing clean up and remove unneeded tests in separate PR.
try: | ||
toc_entry = self._files[key] | ||
except KeyError: | ||
raise OSError(0, '', key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For raising a proper FileNotFoundError
you need to specify the corresponding errno.
import errno
raise FileNotFoundError(errno.ENOENT, 'No such file or directory', key)
(and maybe using pathname
is more proper than key
).
I prefer to defer this change to later. It is nontrivial and may need additional discussion and tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we're down to one issue/question for this branch, a merge conflict, and deferring cleanups to a follow on branch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for doing this - I think it will lead to great improvements in zipimport in the future. My only suggestion would be to open a bpo issue for the subsequent clean up branch, since there are things that we know can be improved on the next pass. I think this is a good change to land.
https://bugs.python.org/issue25711