Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

glob('**') returns all files and directories #102

Open
jaraco opened this issue Jul 12, 2023 · 8 comments
Open

glob('**') returns all files and directories #102

jaraco opened this issue Jul 12, 2023 · 8 comments
Assignees
Labels
help wanted Extra attention is needed

Comments

@jaraco
Copy link
Owner

jaraco commented Jul 12, 2023

A further bug is that glob('**') in zipp returns all files and directoreis, while pathlib.Path.glob('**') returns only directories.

Originally posted by @nh2 in #98 (comment)

@jaraco
Copy link
Owner Author

jaraco commented Jul 12, 2023

That's interesting, because glob.glob('**') returns files and directories, and the docs for pathlib.glob are unclear on the meaning of **. Probably zipp.Path should align with pathlib conventions, but only where they're by design and not themselves an undefined

@jaraco jaraco changed the title glob('**') returns all files and directoreis glob('**') returns all files and directories Jul 12, 2023
@nh2
Copy link
Contributor

nh2 commented Jul 13, 2023

the docs for pathlib.glob are unclear on the meaning of **

You are right, and it gets even more confusing:

  • glob.glob('**') returns dirs and files, non-recursively, not including .
  • glob.glob('**', recursive=True) returns dirs and files, recursively, not including .
  • list(pathlib.Path('.').glob('**')) returns only dirs, recursively, and also includes .

I think there should be a Python upstream issue to document all of that, including the differences between glob and pathlib.

Then zipp could implement what's documented for pathlib.

Repro

Python 3.10.11

mkdir -p dir/subdir ; touch toplevel.txt dir/indir.txt

creates:

toplevel.txt
dir/
  indir.txt
  subdir/
>>> import glob, pathlib

>>> glob.glob('**')
['dir', 'toplevel.txt']

>>> glob.glob('**', recursive=True)
['dir', 'dir/subdir', 'dir/indir.txt', 'toplevel.txt']

>>> list(pathlib.Path('.').glob('**'))
[PosixPath('.'), PosixPath('dir'), PosixPath('dir/subdir')]

@jaraco
Copy link
Owner Author

jaraco commented Mar 13, 2024

@nh2, since a lot of work has been done upstream to clarify the behavior, intent, and exceptions, would you be willing to review the state of the art for CPython (pathlib) and make a recommendation for what behaviors would be best for zipp.Path/zipfile.Path?

@nh2
Copy link
Contributor

nh2 commented Mar 13, 2024

I would like to, but I expect that this month I'll be completely out of time.

@jaraco
Copy link
Owner Author

jaraco commented Mar 13, 2024

Sounds good. I'll assign it to you but also flag it up for others to consider. Take your time and come back to it when you're up for it. If someone else wants to help, please comment here first.

@jaraco jaraco added the help wanted Extra attention is needed label Mar 13, 2024
@barneygale
Copy link

pathlib.Path.glob('**') returns both files and directories as of Python 3.13, so I think this bug is invalid.

@jaraco
Copy link
Owner Author

jaraco commented Nov 22, 2024

The current implementation brings its own glob implementation. Barney, do you have any interest in unifying the implementations?

@barneygale
Copy link

I'm very interested in that :-) in fact I've been working on a little proof-of-concept in the background, which is why I logged a few zipp issues lately

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants