Skip to content

Follow symlinks recursively #56

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

kaitj
Copy link
Contributor

@kaitj kaitj commented Jun 11, 2025

Resolves #55

The path.glob that was used didn't follow directories that were symlinked. In py3.13, recurse_symlinks argument was introduced, but this is still an issue in py3.11 and py3.12. This introduces a method that performs similarly to py3.13 to recurse symlinks as well via os.walk. From what I was able to figure out, this was only needed when indexing a subject directory. I also added a (I admit messy) pytest fixture for testing indexing with symlinks.

I wonder if there is a better fix / way to do this. Turns out yes...the changes stem from trying to unify pathlib.path.glob with glob.glob (see python/cpython#117589). For now, this is falling back on glob.glob with a recursive pattern to support <py3.13.

@nx10 - mind taking a look? My brain is doing circles trying to sort out this symlinking stuff.

kaitj added 2 commits June 10, 2025 14:11
The current implementation did not follow symlink directories, only files. The built-in flag
'recurse_symlinks' was not introduced until py3.13. This commit adds a method for safely recursing directories for local paths
(assumes we don't have to worry about symlinks for cloudpaths)

Wil need to add some proper tests, but interactive testing suggests this fix works. Eventually, this method can be dropped in favor of
path.rglob(*, recurse_symlinks=True) when only py3.13+ is supported.
@kaitj kaitj requested a review from nx10 June 11, 2025 15:17
@kaitj kaitj self-assigned this Jun 11, 2025
@kaitj kaitj changed the title Recursive globbing Follow symlinks recursively Jun 11, 2025
Copy link

codecov bot commented Jun 11, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 95.15%. Comparing base (79c824e) to head (1f77700).

Additional details and impacted files
@@            Coverage Diff             @@
##             main      #56      +/-   ##
==========================================
+ Coverage   95.07%   95.15%   +0.07%     
==========================================
  Files           7        7              
  Lines         487      495       +8     
==========================================
+ Hits          463      471       +8     
  Misses         24       24              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@kaitj kaitj force-pushed the maint/rglob branch 4 times, most recently from 3380af7 to f411c9c Compare June 11, 2025 20:09
@pytest.fixture
def symlink_dataset(tmp_path: Path, sub: str = "sub-001", ses: str = "ses-001") -> Path:
data = tmp_path / "data"
data.mkdir()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be cleaned up?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you say "cleaned up" do you mean the code for the fixture or like the directory itself?

@@ -400,7 +402,14 @@ def _index_bids_subject_dir(
_, subject = path.name.split("-", maxsplit=1)

records = []
for p in path.rglob("sub-*"):
if isinstance(path, CloudPath):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or python >3.13 right?

Copy link
Contributor Author

@kaitj kaitj Jun 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check is added in 1f77700.

I did need to have separate rglob calls for ClouthPath and py3.13+ - the arguments were a little bit different.

- Use built-in method for ClouthPath's since `glob.glob` does not properly recurse through
sub-directories. If it is a local path, just fall back to `glob.glob`. Alternatively, could
just check for Python version and only run this if <py3.13.
- The built-in rglob methods have slightly different parameters so need different calls for them.
@kaitj kaitj requested a review from nx10 July 22, 2025 18:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Symlink directory following
2 participants