Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initialising Archive with a pathlib.Path fails. #3126

Closed
GPRicci opened this issue Feb 2, 2024 · 2 comments
Closed

Initialising Archive with a pathlib.Path fails. #3126

GPRicci opened this issue Feb 2, 2024 · 2 comments

Comments

@GPRicci
Copy link

GPRicci commented Feb 2, 2024

Creating an instance of an Archive with a pathlib.Path fails with a ValueError "bad archive content".
Debugging the code, I suspect the following lines are responsible:

PyMuPDF/src/__init__.py

Lines 1749 to 1752 in 4d604d6

if hasattr(content, "name"):
content = content.name
elif isinstance(content, pathlib.Path):
content = str(content)

Since a Path object has an attribute name (and given that's checked first), the content variable receives the final path component.
Therefore, when the code checks if content is either a directory or file (lines 1754 and 1762), it will only be true if the final path element is either a subdirectory or file in the directory where the __init__.py file resides.

I tried inverting the conditions of the if and elif statements on the cited snippet, i.e check if content is an instance of Path before checking it has the attribute name, and it worked. However, given my limited knowledge of the library, I'm not sure if this would break in other scenarios.

@julian-smith-artifex-com
Copy link
Collaborator

Thanks for the report.

I agree that we are incorrectly using the leafname, which means that we will usually not find the intended file or directory. We'll have an internal discussion about how best to make things work, and hopefully release a fix maybe early next week.

julian-smith-artifex-com added a commit to ArtifexSoftware/PyMuPDF-julian that referenced this issue Feb 6, 2024
src/__init__.py:
    Simplified the logic so we systematically switch on the type of
    `content`. This also avoids behaviour depending on whether str(content)
    (when content is not itself a string) happens to match an existing file or
    directory.

    Also removed the local variables and pass params explicitly to local fn
    `make_subarch()`.

    Also use `assert` if we are passed incorrect types, instead of explicitly
    raising an exception.

tests/test_general.py:
    Added test_archive_3126() for pymupdf#3126.
julian-smith-artifex-com added a commit to ArtifexSoftware/PyMuPDF-julian that referenced this issue Feb 10, 2024
src/__init__.py:
    Simplified the logic so we systematically switch on the type of
    `content`. This also avoids behaviour depending on whether str(content)
    (when content is not itself a string) happens to match an existing file or
    directory.

    Also removed the local variables and pass params explicitly to local fn
    `make_subarch()`.

    Also use `assert` if we are passed incorrect types, instead of explicitly
    raising an exception.

tests/test_general.py:
    Added test_archive_3126() for pymupdf#3126.
julian-smith-artifex-com added a commit that referenced this issue Feb 13, 2024
src/__init__.py:
    Simplified the logic so we systematically switch on the type of
    `content`. This also avoids behaviour depending on whether str(content)
    (when content is not itself a string) happens to match an existing file or
    directory.

    Also removed the local variables and pass params explicitly to local fn
    `make_subarch()`.

    Also use `assert` if we are passed incorrect types, instead of explicitly
    raising an exception.

tests/test_general.py:
    Added test_archive_3126() for #3126.
julian-smith-artifex-com added a commit that referenced this issue Feb 14, 2024
src/__init__.py:
    Simplified the logic so we systematically switch on the type of
    `content`. This also avoids behaviour depending on whether str(content)
    (when content is not itself a string) happens to match an existing file or
    directory.

    Also removed the local variables and pass params explicitly to local fn
    `make_subarch()`.

    Also use `assert` if we are passed incorrect types, instead of explicitly
    raising an exception.

tests/test_general.py:
    Added test_archive_3126() for #3126.
@julian-smith-artifex-com
Copy link
Collaborator

Fixed in 1.23.23.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants