Description
openedon Apr 29, 2021
I'm not sure whether this is considered a bug, but even if considered an "enhancement", it'd be very good to address the situation I'm going to describe.
- Within a Python script, a given name may identify either a package or a module.
- Despite of item 1, a Python codebase/project may still
import
, in different scripts (or contexts), a package or a module under the same name. - But, in a scenario such as that described in item 2, astroid's
import_module(…)
function will always fail for at least of the same-named package or moduleimport
.
This is a project to illustrate the problem.
- script1.py
import sys
import os.path
base_dir = os.path.dirname(os.path.realpath(__file__))
lib_dir = os.path.join(base_dir, 'main-site-packages')
sys.path.append(lib_dir)
from nameofsomething.pack.lib import f
print('script 1')
- script2.py
import sys
import os.path
base_dir = os.path.dirname(os.path.realpath(__file__))
lib_dir = os.path.join(base_dir, 'extra-site-packages')
sys.path.append(lib_dir)
from nameofsomething import g
print('script 2')
Both scripts can be run independently, without any error. Even though, script1.py imports the module nameofsomething
and script2.py imports the package nameofsomething
.
Now, let's make the (reasonable fair, but not strictly correct) assumption that my astroid-based tool scans all files inside basedir for analysis… it traverses the AST of both script1.py and script2.py and, upon visitation of an import
, it attempts to load the package/module in question. Below is simplified/emulated version of the tool's behaviour, with only the relevant parts:
- astroidfail.py
import sys
import os.path
from astroid.manager import AstroidManager
def fail():
M = AstroidManager()
base_dir = os.path.dirname(os.path.realpath(__file__))
extra_lib_dir = os.path.join(base_dir, 'extra-site-packages')
sys.path.append(extra_lib_dir)
script2_path = os.path.join(base_dir, 'script2.py')
node = M.ast_from_file(script2_path)
node.import_module('nameofsomething')
main_lib_dir = os.path.join(base_dir, 'main-site-packages')
sys.path.append(main_lib_dir)
script1_path = os.path.join(base_dir, 'script1.py')
node = M.ast_from_file(script1_path)
node.import_module('nameofsomething.pack.lib')
if __name__ == '__main__':
fail()
Running this code, the following error is triggered:
This error makes sense, since the sys.path
contains both a package and a module under the same name, i.e., nameofsomething
, violating item 1 from my list at the top. One could keep separate managers or (somehow) isolate the processing of each script, but this will typically lead to more expensive computation. So (I believe) that this situation is common among tools that perform static analysis over a codebase/project as a whole, as in item 2 from my list.
Can we get this fixed?
From what I can tell by inspection/debugging the problem is in the _find_spec_with_path function and its successive invocations. Note that different finders are used to locate a module, in particular the ImportlibFinder and the PathSpecFinder; however, the module name is search by parts part1.part2…partN
, and, whenever one of its subparts has match, it's considered to be the right one. In the situation I describe, though, this reasoning is incorrect, because for the import
of package nameofsomething.pack.lib
we may end up first finding the module nameofsomething.py
and flagging it as the correct match, but it isn't… leading to subsequent failure when the part pack
of the package is searched for. I suppose that iterating over the spec finders and stopping the search once the entire full name is matched would fix the issue.