Skip to content

error upon import_module in context where package and module exists with the same name #981

Open

Description

I'm not sure whether this is considered a bug, but even if considered an "enhancement", it'd be very good to address the situation I'm going to describe.

  1. Within a Python script, a given name may identify either a package or a module.
  2. Despite of item 1, a Python codebase/project may still import, in different scripts (or contexts), a package or a module under the same name.
  3. But, in a scenario such as that described in item 2, astroid's import_module(…) function will always fail for at least of the same-named package or module import.

This is a project to illustrate the problem.

Screen Shot 2021-04-29 at 16 29 17

  • script1.py
import sys
import os.path

base_dir = os.path.dirname(os.path.realpath(__file__))
lib_dir = os.path.join(base_dir, 'main-site-packages')
sys.path.append(lib_dir)

from nameofsomething.pack.lib import f

print('script 1')
  • script2.py
import sys
import os.path

base_dir = os.path.dirname(os.path.realpath(__file__))
lib_dir = os.path.join(base_dir, 'extra-site-packages')
sys.path.append(lib_dir)

from nameofsomething import g

print('script 2')

Both scripts can be run independently, without any error. Even though, script1.py imports the module nameofsomething and script2.py imports the package nameofsomething.

Now, let's make the (reasonable fair, but not strictly correct) assumption that my astroid-based tool scans all files inside basedir for analysis… it traverses the AST of both script1.py and script2.py and, upon visitation of an import, it attempts to load the package/module in question. Below is simplified/emulated version of the tool's behaviour, with only the relevant parts:

  • astroidfail.py
import sys
import os.path
from astroid.manager import AstroidManager


def fail():
    M = AstroidManager()
    base_dir = os.path.dirname(os.path.realpath(__file__))

    extra_lib_dir = os.path.join(base_dir, 'extra-site-packages')
    sys.path.append(extra_lib_dir)
    script2_path = os.path.join(base_dir, 'script2.py')
    node = M.ast_from_file(script2_path)
    node.import_module('nameofsomething')

    main_lib_dir = os.path.join(base_dir, 'main-site-packages')
    sys.path.append(main_lib_dir)
    script1_path = os.path.join(base_dir, 'script1.py')
    node = M.ast_from_file(script1_path)
    node.import_module('nameofsomething.pack.lib')

if __name__ == '__main__':
    fail()

Running this code, the following error is triggered:

Screen Shot 2021-04-29 at 16 34 52

This error makes sense, since the sys.path contains both a package and a module under the same name, i.e., nameofsomething, violating item 1 from my list at the top. One could keep separate managers or (somehow) isolate the processing of each script, but this will typically lead to more expensive computation. So (I believe) that this situation is common among tools that perform static analysis over a codebase/project as a whole, as in item 2 from my list.

Can we get this fixed?

From what I can tell by inspection/debugging the problem is in the _find_spec_with_path function and its successive invocations. Note that different finders are used to locate a module, in particular the ImportlibFinder and the PathSpecFinder; however, the module name is search by parts part1.part2…partN, and, whenever one of its subparts has match, it's considered to be the right one. In the situation I describe, though, this reasoning is incorrect, because for the import of package nameofsomething.pack.lib we may end up first finding the module nameofsomething.py and flagging it as the correct match, but it isn't… leading to subsequent failure when the part pack of the package is searched for. I suppose that iterating over the spec finders and stopping the search once the entire full name is matched would fix the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions