Skip to content

Resolve symlinks in go-to-definition within a repo #7070

Open
@Hnasar

Description

@Hnasar

Goal

Simplify our pylance/pyright config in our monorepo setup, and improve pylance startup performance.

Summary

With go-to-definition (editor.action.revealDefinition) and friends, if the definition file is found via a symlink, I would like pylance to resolve the symlink when opening the file.

(cc @rchiodo and @debonte for pylance and @cwebster-99 @luabud for monorepo support)

Motivation

In our monorepo, we have a mix of languages, and teams with varying preferences for file layout.
Python packages are found at various levels within the repo, and these files may live alongside other scripts and other things ought not be import targets.

To support static analysis tools in this kind of setup, we have a directory (e.g. pkgs) with symlinks to the individual python packages.
When linting and at runtime, we put this path on PYTHONPATH, and then it's as if we had a well laid-out python environment.

Currently, Pylance opens the definition file with its symlinked path. This results in duplicate tabs opened to the same file, and means certain things such as git gutters aren't rendered in the symlinked file.

Therefore, to avoid this, we maintain a pyrightconfig.json with extraPaths that includes the parent of every python package. This is roughly 200 directories right now…

The sole benefit of this config is that Go to Definition opens the canonical path of the file, within the team's code.
This comes at a heavy price though:

  • the config is hard to maintain; if teams add a package but forget to add it to the list then pylance reports import errors (red squiggle)
  • if there scripts adjacent to packages, Pylance sometimes accidentally picks them up as valid import targets
  • having a large number of extraPaths is extremely slow. I did some rough benchmarking and found that pylance took about 60s after a 'reload window' to start showing syntax highlighting. If I instead just added the pkgs symlink directory, then it started in 20s.

Only if Pylance resolved symlinks we would be very happy with our monorepo setup.

Partial Workaround

Install https://github.com/zaucy/vscode-symlink-follow/ and configure

    "symlink-follow.autoFollow": true,
    "symlink-follow.onlyFollowWithinWorkspace": true,

This is subpar though because when resolving symlinks, the symlink quickly opens then closes, causing a visual flash, and breaking repon closed tab (workbench.action.reopenClosedEditor (cmd/ctrl + shift + t)): zaucy/vscode-symlink-follow#7

Reproduction

Shell script to reproduce
#!/bin/sh

# make a pretend repo
mkdir test; cd test
mkdir myrepo

# make two packages where one imports the other
mkdir -p myrepo/team1/pkg1/ myrepo/team2/pkg2/
echo "import pkg2" > myrepo/team1/pkg1/__init__.py
touch myrepo/team2/pkg2/__init__.py

# symlink the packages into a consistent import location
mkdir myrepo/pkgs
ln -sr myrepo/team1/pkg1/ myrepo/pkgs/pkg1
ln -sr myrepo/team2/pkg2/ myrepo/pkgs/pkg2

# create a pyrightconfig.json file that includes the pkgs directory
echo '{ "extraPaths": [ "pkgs" ] }' > myrepo/pyrightconfig.json

# symlink it
ln -s myrepo my-symlinked-repo

# open the myrepo in vscode
code myrepo

# expected:
# myrepo/team1/pkg1/__init__.py 'import pkg2' go to definition should go to `myrepo/team2/pkg2/__init__.py`

# actual:
# myrepo/team1/pkg1/__init__.py 'import pkg2' go to definition goes to `myrepo/pkgs/pkg2/__init__.py`


# then open the symlinked repo
code my-symlinked-repo
# expected & actual:
# my-symlinked-repo/team1/pkg1/__init__.py 'import pkg2' go to definition goes to my-symlinked-repo/pkg2/__init__.py
├── myrepo
│   ├── pkgs
│   │   ├── pkg1 -> ../team1/pkg1
│   │   └── pkg2 -> ../team2/pkg2
│   ├── team1
│   │   └── pkg1
│   │       └── __init__.py  # this imports pkg2
│   └── team2
│       └── pkg2
│           └── __init__.py
└── my-symlinked-repo -> myrepo

expected:

In myrepo/team1/pkg1/__init__.py 'import pkg2' go to definition should go to myrepo/team2/pkg2/init.py

actual:

In myrepo/team1/pkg1/__init__.py 'import pkg2' go to definition instead goes to myrepo/pkgs/pkg2/init.py

Background

Proposed Acceptance criteria

  1. Given a symlinked repo, when the user runs go to definition, it should open up a filepath within the symlinked repo. (This is the current behavior as of [Linux] GoTo definition performs symlink resolution #5136)
  2. Given a repo with symlinks that resolve to files within the repo, when the user runs go to definition, pylance should open up the resolved path within the repo. (this would address this ticket's goal, and also the previous ask in Paths to files are not resolved with realpath before opening #4588)

Sample fix

I verified that this small diff properly resolves symlinks.
Hnasar/pyright@45c609e

If needed, I can continue trying to implement the acceptance criteria in this, but I wanted to get buy-in and see what kind of semantics were amenable to the Pylance & Pyright devs.

(best case scenario would be to get something like this fixed upstream rather than in something downstream fork).

Elaboration

In terms of the monorepo setup guide our monorepo fits into the "Scenario 1":

Using one shared virtual environment

In microsoft/vscode-python#21204, @luabud writes

If a mono repo is set up in a way that one can use a shared virtual environment (i.e. there's no dependency conflicts between the projects in the mono repo), my understanding is that the current experience is good enough as one can simply open the root/base folder in VS Code, create a virtual environment on the project root and install all the dependencies on that same venv. The extension can automatically activate that venv and all actions can be performed inside it.

Similar as to #4588, the difference with our monorepo is that we don't actually install the packages into a venv. We have a shared venv globally installed to reduce space on all machines, and then we dynamically add the pkgs symlink dir to sys.path. If we were to follow the open source way, and introduce a lot of overhead to our build process (similar to this user), we could theoretically:

  • define package metadata for all our python packages
  • reorganize the entire monorepo to move all of our python packages into src folders to avoid accidentally importing scripts
  • require that users create a venv with editable pkg installs
  • figure out how to combine our shared venv with the local in-repo venv

I support hundreds of developers, so such a large rework of our entire repo and build setup will be quite arduous.

A little bit more cleverness with resolving the symlinks would go a long way and is the single blocker to us being happy with our monorepo set up.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions