Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid symlink infinite loops #44

Closed
wants to merge 1 commit into from

Conversation

ssbarnea
Copy link

@ssbarnea ssbarnea marked this pull request as ready for review June 23, 2021 11:48
@FFY00 FFY00 requested a review from jaraco June 23, 2021 18:09
Copy link
Member

@FFY00 FFY00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will probably have a easier time getting in with tests 😉

Comment on lines +251 to +264
dirs = set()
files = set()
for dirpath, dirnames, filenames in os.walk(path, followlinks=True):
st = os.stat(dirpath)
scandirs = []
for dirname in dirnames:
st = os.stat(os.path.join(dirpath, dirname))
dirkey = st.st_dev, st.st_ino
if dirkey not in dirs:
dirs.add(dirkey)
scandirs.append(dirname)
dirnames[:] = scandirs
for f in filenames:
files.add(os.path.join(dirpath, f))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although it's a common pattern, I'm very much a detractor to the init/loop/append pattern (mostly because of how it mutates a local variable, making state analysis much more complicated).

The proposed approach entwines all of the logic of path traversal, link resolution, and deduplication into several loops.

Additionally, this change removes the check for os.path.isfile. I'm not sure that check is important, but it seems to be important enough that it was included previously.

I'd much rather see a solution that separates the concern of walking a tree while avoiding loops and assembling a list of files in that walk. Since distutils is only needed for Python 3.6+, perhaps a pathlib-based solution could help.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've got an alternative solution I'll present.

@jaraco
Copy link
Member

jaraco commented Jul 4, 2021

I'm unsure if the path walker should suppress duplicates due to loops or if it should fail in that situation. Why is it better to ignore the loops rather than report an error that there exists a circular loop?

I wonder too why distutils is special here. How is it that walk isn't solving this issue for other use-cases?

@jaraco jaraco self-assigned this Jul 4, 2021
@ssbarnea
Copy link
Author

ssbarnea commented Jul 4, 2021

Circular links are perfectly fine and there are cases where they cannot be avoided. Their existence is not an "error", the code should be resilient and avoid going into endless loops. This can be avoided by normalising (resolving) the paths.

jaraco added a commit that referenced this pull request Jul 4, 2021
@jaraco
Copy link
Member

jaraco commented Jul 4, 2021

In 88f5c72, I took a stab at creating a test to repro the error and to serve as a regression test for any fix. But interestingly, I couldn't get os.walk to go into infinite recursion. As I dug into it, I found that os.scandir(long).is_dir() after about 33 levels of indirection would raise an OSError:

(Pdb) entry, = os.scandir(long)       
(Pdb) entry
<DirEntry 'link-to-parent'>
(Pdb) entry.is_dir()
*** OSError: [Errno 62] Too many levels of symbolic links: '/private/var/folders/c6/v7hnmq453xb6p2dbz1gqc6rr0000gn/T/tmp1yzcz5nt/link-to-parent/link-to-parent/link-to-parent/link-to-parent/link-to-parent/link-to-parent/link-to-parent/link-to-parent/link-to-parent/link-to-parent/link-to-parent/link-to-parent/link-to-parent/link-to-parent/link-to-parent/link-to-parent/link-to-parent/link-to-parent/link-to-parent/link-to-parent/link-to-parent/link-to-parent/link-to-parent/link-to-parent/link-to-parent/link-to-parent/link-to-parent/link-to-parent/link-to-parent/link-to-parent/link-to-parent/link-to-parent/link-to-parent'

So at least on macOS, the reported infinite recursion never occurs, and as long as the symlink isn't pointing to some location that's expensive to traverse (like '/'), everything behaves nicely. Similarly on Ubuntu, I ran the same test and it completes quickly.

So it's going to prove difficult to have a reliable repro for this issue, as the reported issue is more complex than simple infinite recursion.

@jaraco
Copy link
Member

jaraco commented Jul 4, 2021

In 43fa214, I found a way to assert that recursion and duplication isn't happening. This PR is obviated by #46.

@jaraco jaraco closed this Jul 4, 2021
@jaraco jaraco mentioned this pull request Jul 4, 2021
2 tasks
jaraco added a commit to pypa/setuptools that referenced this pull request Jul 4, 2021
clrpackages pushed a commit to clearlinux-pkgs/setuptools that referenced this pull request Jul 13, 2021
…on 57.1.0

Alexei Colin (1):
      distutils: pass -rpath to linker on macOS >=10.5

Brian Rutledge (1):
      Use shutil for rmtree

Jason R. Coombs (24):
      Test on Python 3.10
      Remove src_root from setup.py, seemingly unused.
      Remove consideration for Java as Jython is no longer supported.
      Unpin dependencies for certs and ssl extras and remove dependency links. Instead, rely on pip for security.
      Remove setup_requires, obviated by build-requires in pyproject.toml.
      Suppress deprecation warnings in flake8 and packaging.tags. Ref pypa/packaging#433.
      👹 Feed the hobgoblins (delint).
      Remove automerge.
      Add test capturing failure. Ref pypa/distutils#44.
      Ensure that the result contains only the one file, not all the different symlink variants to the same file.
      Wrap walk result to prevent infinite recursion. Fixes bpo-44497.
      Extract UniqueDirs for checking uniqueness.
      Rely on stat (inode and device) to deduplicate.
      Move _unique_dirs into classmethod on _UniqueDirs.
      Remove ssl_support. Fixes #2715.
      Update changelog.
      Restore the iterator
      Add test
      Update changelog.
      Add another deprecation warning bypass for flake8 on PyPy because the call depth is missed.
      Remove workaround for pypy/pypy#3503, now that the behavior is consistent with a workaround in importlib_metadata 4.6.1 (python/importlib_metadata#327.
      Remove wincertstore and certifi. Ref #2711 and #2716.
      Update changelog.
      Bump version: 57.0.0 → 57.1.0

Long Nguyen (2):
      Add clang mingw support
      Change get_gcc_versions back to get_versions

Nicolas CANIART (1):
      Entrypoints in declarative config are now supported

Richard Purdie (1):
      setuptools/dist: Fix reproducibility issue by sorting globbing

clint-lawrence (1):
      Fix a broken external link

da-woods (1):
      Fixed get_export_symbols for unicode filenames

messense (1):
      Prefer using `Distribution.has_ext_modules` method
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants