Skip to content

Junctions/symbolic links can cause directories to be missing from source distributions #1197

Open
@thegamecracks

Description

@thegamecracks

Summary

When a directory to be included by hatch is referred to by a directory junction/symbolic link that is seen before the directory, it will be unexpectedly missing from new source distributions.

Steps to Reproduce

  1. Setup a project with the following structure:

    • foo/__init__.py (can be empty)

    • pyproject.toml

      [build-system]
      requires = ["hatchling"]
      build-backend = "hatchling.build"
      
      [project]
      name = "foo"
      version = "1.0.0"
      
      [tool.hatch.build.targets.sdist]
      include = ["foo"]
  2. Create a directory junction / symbolic link to the package with any name sorted before the package name itself:

    # Windows:
    mklink /J bar foo
    # Linux:
    ln -sT foo bar
  3. Attempt to build a source distribution:

    hatch build --target sdist
    # or:
    pip install build
    python -m build --sdist

The resulting dist/foo-1.0.0.tar.gz archive will be missing the foo/ package, which should have been included. When bar is not present, or it has a name that comes after foo, like foo2, the foo package will be included in the archive.

Additional context

After some debugging, the following stack appears to be the cause of this issue:

def recurse_project_files(self) -> Iterable[IncludedFile]:
for root, dirs, files in safe_walk(self.root):

def safe_walk(path: str) -> Iterable[tuple[str, list[str], list[str]]]:
seen = set()
for root, dirs, files in os.walk(path, followlinks=True):
stat = os.stat(root)
identifier = stat.st_dev, stat.st_ino
if identifier in seen:
del dirs[:]
continue
seen.add(identifier)
yield root, dirs, files

From my understanding, because os.stat() follows the bar/ junction / symlink to foo/ by default, and bar/ is returned first by os.walk(), the actual foo/ gets skipped by the safe_walk() function and prevents the builder interface from knowing of its existence.

As for how I discovered this, I was experimenting with different project structures for reactpy, and one of those layouts was:

src/
  py/
    reactpy/
      js/
        node_modules/
          @reactpy/client/
          event-to-object/
        packages/
          @reactpy/client/
          event-to-object/
        package.json
      reactpy/    # python package
      .gitignore  # ignores node_modules
      pyproject.toml

The source code in js/packages/ needed to be included in the source distribution so it could be built by hatch-build-scripts, and of course node_modules/ was to be excluded from the sdist. However, it turns out that node_modules/ contained directory junctions to the package directories which caused hatch to not include them in the sdist. I was not aware that these modules were interfering with the building of the source distribution, nor did I think it would cause any interference because I was using hatch's .gitignore support to filter out node_modules/. Eventually I realized that removing node_modules/ magically solved the issue, but dived deeper into hatchling's source code to try figuring out what was going on.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions