Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZipInfo filename is mangled when os.sep is not '/' #94529

Open
gerph opened this issue Jul 3, 2022 · 0 comments
Open

ZipInfo filename is mangled when os.sep is not '/' #94529

gerph opened this issue Jul 3, 2022 · 0 comments
Labels
stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error

Comments

@gerph
Copy link

gerph commented Jul 3, 2022

Bug report

The ZipInfo object within ZipFile performs an explicit translation of the filename.

cpython/Lib/zipfile.py

Lines 378 to 382 in 7db1d2e

# This is used to ensure paths in generated ZIP files always use
# forward slashes as the directory separator, as required by the
# ZIP format specification.
if os.sep != "/" and os.sep in filename:
filename = filename.replace(os.sep, "/")

I believe this is intended to make it easy to use on Windows where you might pass an explicit pathname to the ZipInfo object creation. On Windows the filesystem separator is commonly \ (although it supports / in many cases), so that this foces the the filename attribute to contain a filename in the unix form.
This logic is used whether the ZipInfo object is created manually (usually to add a new file), or when the filename has been taken from an archive that is being extracted.

However, the logic is broken on systems where the os.sep is anything else other than \ or /. On systems where the os.sep is . this means that if you try to create an archive with a file containing a . extension the filename in the archive will be mangled. On such a system, extracting an archive will also mangle the filename.

To demonstrate this, it is possible to do a very simple command line example:

>>> import zipfile
>>> import os
>>> os.sep = '.'
>>> zipfile.ZipInfo('hello.txt')
<ZipInfo filename='hello/txt' file_size=0>

In the real world, this breaks any possibility of using this module on RISC OS where the filesystem separator in os.sep is .. In the current Python 3 on RISC OS, the ZipFile module will always mangle filenames that have standard extensions.

I believe that the intention of the object is that:

  • the filename initialiser on the object and attribute is in unicode format (this has been enforced since Python 3 by the explicit decodes in the archive member reading code).
  • the filename attribute is formed as would be stored in the archive, using / as a directory separator (stated by documentation filename should be the full name of the archive member).
  • the filename initialiser on the object is allowed to be supplied a path name on unix and windows systems, as a convenience (the referenced code will have been relied on by existing software).

As such, I believe the referenced code is broken, and to retain the above assumptions and to allow the handling of zip archives on systems where os.sep is not / or \, the code should instead read:

        if os.sep == "\\" and os.sep in filename:
            filename = filename.replace(os.sep, "/")

This removes the overzealous replacement of os.sep in the creation of the ZipInfo object.

Further problems exist with the from_file method which I shall raise separately.

There are some issues which might be related to this (but this change does not preclude them): #90139 and #92184.

Your environment

  • CPython versions tested on: Python 3.9, 3.10
  • Operating system and architecture: On OS X, simulating the problem seen on RISC OS.
@gerph gerph added the type-bug An unexpected behavior, bug, or error label Jul 3, 2022
@gerph gerph changed the title ZipInfo filename is mangled when os.sep is not '\' ZipInfo filename is mangled when os.sep is not '.' Jul 3, 2022
@gerph gerph changed the title ZipInfo filename is mangled when os.sep is not '.' ZipInfo filename is mangled when os.sep is not '/' Jul 3, 2022
@iritkatriel iritkatriel added the stdlib Python modules in the Lib dir label Nov 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error
Projects
Status: No status
Development

No branches or pull requests

2 participants