Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mypy: support namespace packages when passing files #9742

Merged
merged 14 commits into from
Dec 12, 2020

Conversation

hauntsaninja
Copy link
Collaborator

@hauntsaninja hauntsaninja commented Nov 22, 2020

This is the successor to #9632. Things should basically be as discussed in that PR. Since #9616 is merged, this should now resolve #5759.

We leave the Bazel integration with --package-root almost entirely untouched, save for a) one change that's a bugfix / doesn't affect the core of what --package-root is doing, b) another drive by bugfix that's not related to this PR.
Change a) fixes the package root __init__.py hackery when passed absolute paths. Change b) fixes the validation logic for package roots above the current directory; it was broken if you passed .. as a package root

Since we're leaving --package-root alone for now, I named the new flag --explicit-package-base to try and avoid confusion. Doing so also matches the language used by BuildSource a little better.

The new logic is summarised in the docstring of SourceFinder.crawl_up.

Some commentary:

  • I change find_sources_in_dir to call crawl_up directly to construct the BuildSource. This helps codify the fact that files discovered will use the same module names as if you passed them directly.
  • Doing so keeps things DRY with the more complicated logic and means, for instance, that we now do more sensible things in some cases when we recursively explore directories that have invalid package names.
  • Speaking of invalid package names, if we encounter a directory name with an invalid package name, we stop crawling. This is necessary because with namespace packages there's no guarantee that what we're crawling was meant to be a Python package. I add back in a check in the presence of __init__.py to preserve current unit tests where we raise InvalidSourceList.
  • The changes to modulefinder are purely cosmetic and can be ignored (there's some similar logic between the two files and this just makes sure they mirror each other closely)
  • One notable change is we now always use absolute paths to crawl. This makes the behaviour more predictable and addresses a common complaint: fixes Running mypy on the current directory without some OS-specific stuff #9677, fixes Relative imports from within the same package lead to "no type hints or stubs" error #8726 and others.
  • I figured this could use more extensive testing than a couple slow cmdline tests. Hopefully this test setup also helps clarify the behaviour :-)

cc @JukkaL

Just so that the logic mirrors closely
@github-actions
Copy link
Contributor

Diff from mypy_primer, showing the effect of this PR on open source code:

+ poetry/__init__.py:4: error: Cannot determine type of '__path__'
- poetry/__init__.py:4: error: Cannot determine type of '__path__'

+ zilencer/migrations/0001_initial.py: error: Duplicate module named '0001_initial' (also at 'zerver/migrations/0001_initial.py')
- zerver/lib/sqlalchemy_utils.py:20: error: unused 'type: ignore' comment
- zerver/lib/sqlalchemy_utils.py:21: error: unused 'type: ignore' comment
- zerver/lib/sqlalchemy_utils.py:22: error: unused 'type: ignore' comment
- zerver/lib/sqlalchemy_utils.py:23: error: unused 'type: ignore' comment
- zerver/lib/sqlalchemy_utils.py:25: error: unused 'type: ignore' comment
- zerver/lib/sqlalchemy_utils.py:26: error: unused 'type: ignore' comment
- zerver/views/message_fetch.py:111: error: unused 'type: ignore' comment
- zerver/views/message_fetch.py:704: error: unused 'type: ignore' comment

+ paasta_tools/kubernetes/__init__.py: error: Source file found twice under different module names: 'paasta_tools.kubernetes' and 'kubernetes'
- paasta_tools/iptables.py:23: error: First argument to namedtuple() should be '_RuleBase', not 'Rule'
- paasta_tools/metrics/metastatus_lib.py:650: error: Argument "key" to "sorted" has incompatible type "Callable[[Any], Sequence[Tuple[str, str]]]"; expected "Callable[[Any], SupportsLessThan]"
- paasta_tools/metrics/metastatus_lib.py:650: error: Argument "key" to "sorted" has incompatible type "Callable[[Any], Sequence[Tuple[str, str]]]"; expected "Callable[[_SlaveT], SupportsLessThan]"

- lib/streamlit/__init__.py:52:1: note: See https://mypy.readthedocs.io/en/latest/running_mypy.html#missing-imports
+ lib/streamlit/components/v1/components.py:30:1: note: See https://mypy.readthedocs.io/en/latest/running_mypy.html#missing-imports

+ src/graphql/__init__.py:280: error: Name 'specified_rules' already defined (by an import)
+ src/graphql/__init__.py:280: error: Name 'validate' already defined (by an import)
+ tests/benchmarks/test_validate_invalid_gql.py:23: error: Module not callable
+ tests/benchmarks/test_validate_gql.py:10: error: Module not callable

+ isort/output.py:596: error: "_LineWithComments" has no attribute "comments"
- isort/output.py:596: error: "_LineWithComments" has no attribute "comments"

- torchvision/__init__.py:13: note: See https://mypy.readthedocs.io/en/latest/running_mypy.html#missing-imports
+ torchvision/datasets/samplers/clip_sampler.py:2: note: See https://mypy.readthedocs.io/en/latest/running_mypy.html#missing-imports

+ sympy/__init__.py:69: error: Name 'ask' already defined (by an import)
+ sympy/__init__.py:69: error: Name 'refine' already defined (by an import)
+ sympy/__init__.py:105: error: Name 'approximants' already defined (by an import)
+ sympy/__init__.py:105: error: Name 'gruntz' already defined (by an import)
+ sympy/__init__.py:105: error: Name 'series' already defined (by an import)
+ sympy/__init__.py:133: error: Name 'continued_fraction' already defined (by an import)
+ sympy/__init__.py:133: error: Name 'egyptian_fraction' already defined (by an import)
+ sympy/__init__.py:155: error: Name 'combsimp' already defined (by an import)
+ sympy/__init__.py:155: error: Name 'fu' already defined (by an import)
+ sympy/__init__.py:155: error: Name 'gammasimp' already defined (by an import)
+ sympy/__init__.py:155: error: Name 'hyperexpand' already defined (by an import)
+ sympy/__init__.py:155: error: Name 'powsimp' already defined (by an import)
+ sympy/__init__.py:155: error: Name 'radsimp' already defined (by an import)
+ sympy/__init__.py:155: error: Name 'ratsimp' already defined (by an import)
+ sympy/__init__.py:155: error: Name 'simplify' already defined (by an import)
+ sympy/__init__.py:155: error: Name 'sqrtdenest' already defined (by an import)
+ sympy/__init__.py:155: error: Name 'trigsimp' already defined (by an import)
+ sympy/__init__.py:168: error: Name 'decompogen' already defined (by an import)
+ sympy/__init__.py:168: error: Name 'diophantine' already defined (by an import)
+ sympy/__init__.py:168: error: Name 'solveset' already defined (by an import)
+ sympy/__init__.py:226: error: Name 'singularities' already defined (by an import)
+ sympy/__init__.py:234: error: Name 'ccode' already defined (by an import)
+ sympy/__init__.py:234: error: Name 'cxxcode' already defined (by an import)
+ sympy/__init__.py:234: error: Name 'fcode' already defined (by an import)
+ sympy/__init__.py:234: error: Name 'jscode' already defined (by an import)
+ sympy/__init__.py:234: error: Name 'latex' already defined (by an import)
+ sympy/__init__.py:234: error: Name 'mathml' already defined (by an import)
+ sympy/__init__.py:234: error: Name 'pretty' already defined (by an import)
+ sympy/__init__.py:234: error: Name 'preview' already defined (by an import)
+ sympy/__init__.py:234: error: Name 'pycode' already defined (by an import)
+ sympy/__init__.py:234: error: Name 'python' already defined (by an import)
+ sympy/__init__.py:234: error: Name 'rcode' already defined (by an import)
+ sympy/__init__.py:251: error: Name 'plot' already defined (by an import)
+ sympy/__init__.py:251: error: Name 'plot_implicit' already defined (by an import)
+ sympy/__init__.py:251: error: Name 'textplot' already defined (by an import)
+ sympy/solvers/__init__.py:17: error: Name 'diophantine' already defined (by an import)
+ sympy/printing/__init__.py:3: error: Name 'pretty' already defined (by an import)
+ sympy/matrices/__init__.py:24: error: Name 'trace' already defined (by an import)
+ sympy/physics/units/__init__.py:47: error: Cannot determine type of 'velocity'
+ sympy/simplify/hyperexpand_doc.py:14: error: Module not callable
+ sympy/integrals/meijerint_doc.py:19: error: Module not callable
+ sympy/integrals/meijerint_doc.py:30: error: Module not callable
+ sympy/integrals/meijerint_doc.py:31: error: Module not callable

+ src/prefect/__init__.py:16: error: Name 'case' already defined (by an import)

@python python deleted a comment from github-actions bot Nov 22, 2020
Apparently this is something mypy is sensitive to. Thanks mypy primer!
This means we could accidentally fallback to calling a FileSystemCache
method if stuff gets moved around.
@github-actions
Copy link
Contributor

Diff from mypy_primer, showing the effect of this PR on open source code:

+ paasta_tools/kubernetes/__init__.py: error: Source file found twice under different module names: 'paasta_tools.kubernetes' and 'kubernetes'
- paasta_tools/iptables.py:23: error: First argument to namedtuple() should be '_RuleBase', not 'Rule'
- paasta_tools/metrics/metastatus_lib.py:650: error: Argument "key" to "sorted" has incompatible type "Callable[[Any], Sequence[Tuple[str, str]]]"; expected "Callable[[Any], SupportsLessThan]"
- paasta_tools/metrics/metastatus_lib.py:650: error: Argument "key" to "sorted" has incompatible type "Callable[[Any], Sequence[Tuple[str, str]]]"; expected "Callable[[_SlaveT], SupportsLessThan]"

+ zilencer/migrations/0001_initial.py: error: Duplicate module named '0001_initial' (also at 'zerver/migrations/0001_initial.py')
- zerver/lib/sqlalchemy_utils.py:20: error: unused 'type: ignore' comment
- zerver/lib/sqlalchemy_utils.py:21: error: unused 'type: ignore' comment
- zerver/lib/sqlalchemy_utils.py:22: error: unused 'type: ignore' comment
- zerver/lib/sqlalchemy_utils.py:23: error: unused 'type: ignore' comment
- zerver/lib/sqlalchemy_utils.py:25: error: unused 'type: ignore' comment
- zerver/lib/sqlalchemy_utils.py:26: error: unused 'type: ignore' comment
- zerver/views/message_fetch.py:111: error: unused 'type: ignore' comment
- zerver/views/message_fetch.py:704: error: unused 'type: ignore' comment

mypy_primer points out two undesirable effects:
1) scripts causing search path confusion
2) scripts with the same names causing issues (e.g. migrations in zulip)
@hauntsaninja
Copy link
Collaborator Author

hauntsaninja commented Nov 22, 2020

Summary of the (now fixed) mypy_primer issues:

  1. While discovering files recursively, it turned out it's important that we list __init__.pys first. I haven't quite been able to get a minimal repro though, so no regression test :-( The real world code it's triggering on is often doing stuff like from .x import x, though. Maybe that plus an import cycle somewhere would give the repro.
  2. I made a defensive change to prevent passing around invalid module names for cases like pkg.invalid-name if we have pkg/__init__.py and pkg/invalid-name.py. But this adds a BuildSource with pkg as a base_dir, causing search paths to get muddied. Secondarily, it causes duplicate module errors if you have similarly named scripts (like zulip's migration scripts)

Copy link
Collaborator

@JukkaL JukkaL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks awesome! I especially like the unit tests -- they both test the behavior in lot of detail and in a clean way, and also act as a nice specification. They are also written very clearly. I really enjoyed reviewing this.

Left some minor comments. The only thing that may be significant is the behavior with packages spanning multiple directories. Feel free to merge once you've addressed all the comments.

"""Determines sort order for directory listing.

The desirable property is foo < foo.pyi < foo.py.
The desirable propertes are:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: propertes -> properties

return root


def get_explicit_package_bases(options: Options) -> Optional[List[str]]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a short summary about the purpose of this.


def normalise_build_source_list(sources: List[BuildSource]) -> List[Tuple[str, Optional[str]]]:
return sorted(
(s.module, normalise_path(s.base_dir) if s.base_dir is not None else None)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add parentheses around the second tuple item to make precedence explicit?

@@ -88,6 +88,7 @@ def __init__(self) -> None:
self.follow_imports_for_stubs = False
# PEP 420 namespace packages
self.namespace_packages = False
self.explicit_package_bases = False
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add comment that explains in some detail what this means.

@@ -88,6 +88,7 @@ def __init__(self) -> None:
self.follow_imports_for_stubs = False
# PEP 420 namespace packages
self.namespace_packages = False
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add more detail to the comment. At least mention that __init__.py files are no longer necessary to define a package, and package can span multiple directories.


def test_crawl_namespace(self) -> None:
options = Options()
options.namespace_packages = True
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about testing namespace packages that span multiple directories (submodules in multiple MYPYPATH entries)?

@hauntsaninja
Copy link
Collaborator Author

Thanks for the review! Implemented all the suggestions. I still owe documentation updates, but I'll do that in a separate PR.

hauntsaninja pushed a commit to hauntsaninja/mypy that referenced this pull request Jan 18, 2021
This should cover the current state on master, as implemented
across python#9742, python#9683, python#9632, python#9616, python#9614, etc.

This will need to be changed if we can make `--namespace-packages` the
default (python#9636).
hauntsaninja pushed a commit to hauntsaninja/mypy that referenced this pull request Jan 18, 2021
This should cover the current state on master, as previously
discussed / implemented across python#9742, python#9683, python#9632, python#9616, python#9614, etc.

This will need to be changed if we can make `--namespace-packages` the
default (python#9636).

I haven't documented some of the finer points of the changes, since it
felt like an inappropriate level of detail (e.g. using absolute paths
when crawling, how directories with invalid package names affect
crawling, etc)
hauntsaninja pushed a commit to hauntsaninja/mypy that referenced this pull request Jan 18, 2021
This should cover the current state on master, as previously
discussed / implemented across python#9742, python#9683, python#9632, python#9616, python#9614, etc.

This will need to be changed if we can make `--namespace-packages` the
default (python#9636).

I haven't documented some of the finer points of the changes, since it
felt like an inappropriate level of detail (e.g. using absolute paths
when crawling, how directories with invalid package names affect
crawling, etc)
hauntsaninja added a commit that referenced this pull request Jan 19, 2021
This should cover the current state on master, as previously
discussed / implemented across #9742, #9683, #9632, #9616, #9614, etc.

This will need to be changed if we can make `--namespace-packages` the
default (#9636).

I haven't documented some of the finer points of the changes, since it
felt like an inappropriate level of detail (e.g. using absolute paths
when crawling, how directories with invalid package names affect
crawling, etc)

Co-authored-by: hauntsaninja <>
ilevkivskyi pushed a commit that referenced this pull request Jan 20, 2021
This should cover the current state on master, as previously
discussed / implemented across #9742, #9683, #9632, #9616, #9614, etc.

This will need to be changed if we can make `--namespace-packages` the
default (#9636).

I haven't documented some of the finer points of the changes, since it
felt like an inappropriate level of detail (e.g. using absolute paths
when crawling, how directories with invalid package names affect
crawling, etc)

Co-authored-by: hauntsaninja <>
@niander
Copy link

niander commented Mar 10, 2022

@hauntsaninja
I am trying to avoid having to specify explicit base folders for all the packages we have in our repo. According to the docstring in craw_up:

If namespace packages is on, we crawl upwards until the nearest explicit base directory.
Failing that, we return one past the highest directory containing an init.py

So that means that in a scenario where there is a namespace package at the root, to be able to correctly run mypy passing a file in the command line, it is required to explicitly specify all base directories in mypy_path and enable --explicit-package-bases. This seems odd to me because that means that --namespace-packages alone won't work in the common scenario of having a single top level namespace package unless you explicitly specify base directories.

Why was crawl_up implemented like that? While I trust there is a very good reason for it to be implemented the way it was. Maybe because it is impossible to know the namespace top folder because it could be other folders in the file system. I am particularly curious because I remember that before the introduction of --explicit-package-bases it was working for us with only --namespace-packages enabled. Or, maybe, it wasn't and I am totally wrong, but the curiosity is still there =)

@hauntsaninja
Copy link
Collaborator Author

You might be confusing the directory the package is in with the package directory? Also mypy automatically adds the current directory as an explicit package base, which might be another piece you're missing. Automatically treating cwd as a base is why the common single top level namespace package scenario works.

Please read https://mypy.readthedocs.io/en/stable/running_mypy.html#mapping-file-paths-to-modules which has examples.

The behaviour was previously completely broken when passing --namespace-packages and files. And the changes in behaviour here are backward compatible. You probably weren't checking anything (or are "totally wrong" in some other regard ;-) )

If you're still stuck, I recommend posting on gitter.im or making an issue.

@niander
Copy link

niander commented Mar 10, 2022

@hauntsaninja thanks for the response. I am not stuck. It just seems that there is no way out of needing to define the base folder for all projects in the repo, which are using the same mypy.ini.

I don't think I am confusing directory and package. Yes, there is a directory that includes the packages and other files like setup.py. This is the base directory. And, because the top level package is a namespace package, the first folder inside it doesn't have __init__.py but the other subdirectories have. With just --namespace-packages and passing a file to check, mypy will consider the base directory as the namespace package folder and not the folder containing the namespace package. Which is pretty much what also happens when --namespace-packages is disabled.

Yes, maybe it was working before because the cwd was always the directory containing the package. But now, that is not true, especially since we mostly run mypy from the Python extension in VSCode and the cloned repository has several other python projects.

Anyways, I wonder, for instance, how pylint discover the correct base folder and the fully qualified name for the packages when checking a single file and if mypy could do the same.

@hauntsaninja
Copy link
Collaborator Author

hauntsaninja commented Mar 10, 2022

From what you're describing, it sounds like using mypy --namespace-packages --explicit-package-bases without any listing should work. I recommend creating an issue that has a tree-like description of your project layout.

The situation in which you'd need to list things is if you had a layout with something like:

root
    - mypy.ini
    - projfolder1
        - setup.py
        - namespace_package1
    - projfolder2
        - setup.py
        - namespace_package2

@niander
Copy link

niander commented Mar 10, 2022

That is pretty much what we have. When I run mypy --namespace-packages --explicit-package-bases for a file, lets say, projfolder1/namespace_package1/package1/mymodule.py and cwd being root, with --verbose I can see that the base_dir inferred is root/projfolder1/namespace_package1. That doesn't seem to be right because mypy will finish with success without detecting any error in this case even though there is an issue in the file. Now, with explicit mypy_path defined, the base_dir would be root/projfolder1 and that works.

Do you think this is a bug worthy of an issue or just expected behavior?

@hauntsaninja
Copy link
Collaborator Author

I recommend creating an issue that has a tree-like description of your actual project layout (or a minimum repro) and the output with --verbose. Even better if your code is open source :-)

@jaklan
Copy link

jaklan commented Jul 6, 2022

@niander have you resolved your issue? I have exactly the same problem in monorepo setup when using pre-commit (so passing file paths to mypy and cwd in root):

root/
-- projects/
---- project_a/
------ src/
-------- namespace/
----------- subnamespace/
------------- utils/
--------------- __init__.py
--------------- main.py
---- project_b/
------ src/
-------- namespace/
----------- other_subnamespace/
------------- utils/
--------------- __init__.py
--------------- main.py

will fail due to error: Duplicate module named "utils.main".

@hauntsaninja --explicit-package-bases doesn't help here, because then we get:
module='projects.project_a.src.namespace.subnamespace.utils.main'
instead of:
module='namespace.subnamespace.utils.main'
and I don't see an easy way to modify MYPYPATH to include all needed paths automatically as well...

@hauntsaninja
Copy link
Collaborator Author

hauntsaninja commented Jul 7, 2022

As I said in my previous comment, this PR is not the place to discuss issues or for user support. I'll be locking this thread.

--explicit-package-bases plus setting MYPYPATH=projects/project_a/src:projects/project_b/src should do the trick. Feel free to write a wrapper script around mypy that computes the MYPYPATH appropriately, but I don't see a way for mypy to divine your project layout. If you have a recommendation, please open a feature request issue.

@python python locked as resolved and limited conversation to collaborators Jul 7, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
4 participants