Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zipfile.ZipInfo.from_file assumes that the archive name is a native filesystem path #94533

Open
gerph opened this issue Jul 3, 2022 · 0 comments
Labels
stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@gerph
Copy link

gerph commented Jul 3, 2022

Bug report

The zipfile.ZipInfo.from_file method takes a native filename, and an optional name to be placed in the archive. However, it uses a converted form of the native filename to the ZipInfo() constructor without turning it into the correct form for an archive name.

The code in question is:

cpython/Lib/zipfile.py

Lines 526 to 541 in 7db1d2e

if isinstance(filename, os.PathLike):
filename = os.fspath(filename)
st = os.stat(filename)
isdir = stat.S_ISDIR(st.st_mode)
mtime = time.localtime(st.st_mtime)
date_time = mtime[0:6]
if not strict_timestamps and date_time[0] < 1980:
date_time = (1980, 1, 1, 0, 0, 0)
elif not strict_timestamps and date_time[0] > 2107:
date_time = (2107, 12, 31, 23, 59, 59)
# Create ZipInfo instance to store file information
if arcname is None:
arcname = filename
arcname = os.path.normpath(os.path.splitdrive(arcname)[1])
while arcname[0] in (os.sep, os.altsep):
arcname = arcname[1:]

I believe that code is assuming that:

  1. ZipInfo() will know that a native filename can be converted to an archive name (this is only true on POSIX and Windows systems, see ZipInfo filename is mangled when os.sep is not '/' #94529), and so is a bad assumption
  2. A native filename is equivalent to a name that can be put into an archive. This is clearly not true on all systems.
  3. The archive name supplied will conform to the filesystem normalisation rules for the system it is running on. Again, this will not be true on filesystems that do not honour the same normalisation rules.

I believe the code should explicitly convert the filename to a form which is suitable for use as an archive name, rather than assuming that the name given is suitable.

A real world example on a system which uses a different scheme for filename separators is RISC OS, which has an os.sep of .. In such a case you might make a call to place a file in the archive with something like:

zi = ZipInfo.from_file("Application.Directory.Filename/txt")

The expectation would be that this would create a ZipInfo object with an arcname of Application/Directory/Filename.txt. However, this will actually use the arcname Application.Directory.Filename/txt - which isn't at all what was intended or useful.

I believe the code would be better handled as something like this...

        # Create ZipInfo instance to store file information
        if arcname is None:
            # The filename is still in native filename format, so we must convert to
            # the form used by the archive.
            arcname = os.path.normpath(os.path.splitdrive(filename)[1])

            # Normalise the separator to os.sep
            if os.altsep:
                arcname = arcname.replace(os.altsep, os.sep)

            # Ensure that the directory and extension separators are in arcname format
            if os.sep != '/' or os.extsep != '.':
                parts = arcname.split(os.sep)
                if os.extsep != '.':
                    parts = [part.replace(os.extsep, '.') for part in parts]
                arcname = '/'.join(parts)

        while arcname and arcname[0] == '/':
            arcname = arcname[1:]

This would...

  • Only perform the normalisation through the filesystem functions when a filename is given. That is, if you supply an arcname that's what you get, unaffected by the native platform's filesystem semantics.
  • Converts the filesystem semantics (from os.sep, os.altsep and os.extsep) to arcname semantics. We normalise from altsep to sep to simplify things, and then construct a name which uses the correct arcname convention from the known parts of the name.
  • Only performs the conversion if they would not be identity operations, so we're efficient on systems where the name is a pass through.
  • Retain the stripping of leading / characters in the archive, so that we don't create them in an odd place (although it's arguable this should be in the ZipInfo constructor to prevent overwriting system files, etc).
  • Ensure that ZipInfo is always called with a name which is an arcname, as that's what it expects anyhow (it has a little allowance for names in native form but this only ever worked on Windows and should not be relied on because on other systems it can never work, see ZipInfo filename is mangled when os.sep is not '/' #94529)
  • Retain as much of the existing behaviour as was actually reliable and deterministic as possible.

Your environment

  • CPython versions tested on: Python 3.9, Python 3.10
  • Operating system and architecture: OSX, RISC OS
@gerph gerph added the type-bug An unexpected behavior, bug, or error label Jul 3, 2022
@iritkatriel iritkatriel added stdlib Python modules in the Lib dir and removed stdlib Python modules in the Lib dir labels Nov 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
Projects
Status: No status
Development

No branches or pull requests

2 participants