Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-25711: Rewrite zipimport in pure Python. #6809

Merged
merged 17 commits into from
Sep 18, 2018

Conversation

serhiy-storchaka
Copy link
Member

@serhiy-storchaka serhiy-storchaka commented May 14, 2018

@serhiy-storchaka serhiy-storchaka requested a review from a team as a code owner July 8, 2018 17:21
Lib/zipimport.py Outdated

path = _get_module_path(self, fullname)
if mi:
fullpath = path + path_sep + '__init__.py'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bootstrap_external._path_join()

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the benefit of using bootstrap_external._path_join()?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Readability. When I read path + path_sep + '__init__.py' I have to mentally piece together that this is constructing a file path, while _path_join(path, '__init__.py)` gives me that context implicitly.

Lib/zipimport.py Outdated
if mi:
fullpath = path + path_sep + '__init__.py'
else:
fullpath = path + '.py'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might as well use an f-string.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the benefit of using an f-string instead of + for concatenating two strings?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find it more readable.

Lib/zipimport.py Outdated
# add __path__ to the module *before* the code gets
# executed
path = _get_module_path(self, fullname)
fullpath = f'{self.archive}{path_sep}{path}'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_bootstrap_external._path_join()

Lib/zipimport.py Outdated

# implementation

def _unpack_uint32(data):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_bootstrap_external._r_long()

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What to do with _unpack_uint16? _unpack_uint32 looks to me more descriptive than _r_long.

Copy link
Member

@brettcannon brettcannon Aug 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can keep _unpack_uint16, I just don't think we need to have two implementations to do the exact same thing of reading in the bytes of a little-endian 4 bytes int. If you don't like the name then feel free to rename it in importlib.

Lib/zipimport.py Outdated
name = name.decode('latin1').translate(cp437_table)

name = name.replace('/', path_sep)
path = f'{archive}{path_sep}{name}'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bootstrap_external._path_join()

@brettcannon brettcannon requested a review from warsaw August 17, 2018 18:03
Lib/zipimport.py Outdated

def _unpack_uint16(data):
"""Convert 2 bytes in little-endian to an integer."""
assert len(data) == 2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Wouldn't be better if we raised ValueError here and in _unpack_uint32()?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In all cases the length of the data is checked before calling _unpack_uint*(). The assertion condition is always true.

Copy link
Member

@warsaw warsaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments just based on reading the code. I'm going to test this branch with importlib_resources (standalone) next.

Lib/zipimport.py Outdated
@@ -0,0 +1,640 @@
'''zipimport provides support for importing Python modules from Zip archives.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style nit: this should be a """ string.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Lib/zipimport.py Outdated
try:
raw_data = fp.read(data_size)
except OSError:
raise OSError("zipimport: can't read data")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this just here to change the error message? I guess that's for compatibility with the message raised by the C implementation? I'm not sure how useful that is, and it does kind of hide the original exception (which might be useful for debugging). My inclination is to preserve the original OSError rather than produce one with a different message, but if you feel otherwise, why not use a raise from here too?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the try block.

Lib/zipimport.py Outdated
try:
decompress = _get_decompress_func()
except:
raise ZipImportError("can't decompress data; zlib not available")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

except Exception and raise from

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added except Exception.

Copy link
Member

@warsaw warsaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The branch does work with importlib_resources AFAICT.

@bedevere-bot
Copy link

When you're done making the requested changes, leave the comment: I have made the requested changes; please review again.

@python python deleted a comment from warsaw Sep 11, 2018
Copy link
Member Author

@serhiy-storchaka serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your review @warsaw. I have addressed most of your comments. But AFAIK from error is not needed in the Python code. Exceptions is chained by default. You need to add from None for suppressing chaining.

Lib/zipimport.py Outdated
'PQRSTUVWXYZ[\\]^_'
'`abcdefghijklmno'
'pqrstuvwxyz{|}~\x7f'

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for readability. The above part is ASCII, and 16 chars per row, the below part is non-ASCII, and 8 chars per row. What comment should I add?

Lib/zipimport.py Outdated
from zlib import decompress
except:
_bootstrap._verbose_message('zipimport: zlib UNAVAILABLE')
raise ZipImportError("can't decompress data; zlib not available")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The C implementation catches all exceptions. I agree, that this is evil and will change this.

Lib/zipimport.py Outdated
try:
raw_data = fp.read(data_size)
except OSError:
raise OSError("zipimport: can't read data")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the try block.

Lib/zipimport.py Outdated
try:
decompress = _get_decompress_func()
except:
raise ZipImportError("can't decompress data; zlib not available")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added except Exception.

Copy link
Member Author

@serhiy-storchaka serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to make behavior changing clean up and remove unneeded tests in separate PR.

try:
toc_entry = self._files[key]
except KeyError:
raise OSError(0, '', key)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For raising a proper FileNotFoundError you need to specify the corresponding errno.

import errno
raise FileNotFoundError(errno.ENOENT, 'No such file or directory', key)

(and maybe using pathname is more proper than key).

I prefer to defer this change to later. It is nontrivial and may need additional discussion and tests.

Copy link
Member

@warsaw warsaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're down to one issue/question for this branch, a merge conflict, and deferring cleanups to a follow on branch.

Copy link
Member

@warsaw warsaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this - I think it will lead to great improvements in zipimport in the future. My only suggestion would be to open a bpo issue for the subsequent clean up branch, since there are things that we know can be improved on the next pass. I think this is a good change to land.

@serhiy-storchaka
Copy link
Member Author

serhiy-storchaka commented Sep 18, 2018

Thank you for your review @warsaw. I have created two following PRs (#9404 and #9406) and one issue (bpo-34726). Will open more. I still don't know what to do with exceptions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
DO-NOT-MERGE type-feature A feature request or enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants