Skip to content

Feature: Persistent caching of inference results #1145

Open
@joshua-cannon-techlabs

Description

@joshua-cannon-techlabs

Steps to reproduce

For fairly large repositories, using inference can be incredibly slow. Running pylint with a rule which leverages inference can take multiple seconds, while running a rule without inference is almost immediate.

From profiling, most of the time taken is traversing the file's dependencies and parsing them as well. This information is cached (see astroid.manager.AtroidManager.brain), however the cache is in-memory and therefore doesn't last between runs. This is likely due to the fact that astroid classes can't be easily serialized (pickling them is unfortunately not a drop-in solution).

Ideally, there is an option to enable persistent caching which is file-timestamp aware. If a particular module hasn't been changed, astroid could pull from the cache (exactly like Python's caching of bytecode).

Of course, a lot of thought has to be put into caching, but the benefits would be on the order of dev-years if not dev-decades IMO.

Current behavior

  • Run pylint on a single file in a large repo with assert-on-tuple as the only rule: Maybe ~0.5s on a bad day
  • Run pylint on the same file with abstract-class-instantiated as the only rule: Roughly 7.4s
  • Run pylint twice in the same python process with abstract-class-instantiated as the only rule: Also roughly 7.4s

Expected behavior

Running pylint multiple times on the same file ideally doesn't take the same amount of time each time.

python -c "from astroid import __pkginfo__; print(__pkginfo__.version)" output

2.4.2

Metadata

Metadata

Assignees

No one assigned

    Labels

    Enhancement ✨Improvement to a componentHigh effort 🏋Difficult solution or problem to solve

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions