Description
Steps to reproduce
For fairly large repositories, using inference can be incredibly slow. Running pylint
with a rule which leverages inference can take multiple seconds, while running a rule without inference is almost immediate.
From profiling, most of the time taken is traversing the file's dependencies and parsing them as well. This information is cached (see astroid.manager.AtroidManager.brain
), however the cache is in-memory and therefore doesn't last between runs. This is likely due to the fact that astroid
classes can't be easily serialized (pickling them is unfortunately not a drop-in solution).
Ideally, there is an option to enable persistent caching which is file-timestamp aware. If a particular module hasn't been changed, astroid
could pull from the cache (exactly like Python's caching of bytecode).
Of course, a lot of thought has to be put into caching, but the benefits would be on the order of dev-years if not dev-decades IMO.
Current behavior
- Run
pylint
on a single file in a large repo withassert-on-tuple
as the only rule: Maybe ~0.5s on a bad day - Run
pylint
on the same file withabstract-class-instantiated
as the only rule: Roughly 7.4s - Run
pylint
twice in the same python process withabstract-class-instantiated
as the only rule: Also roughly 7.4s
Expected behavior
Running pylint
multiple times on the same file ideally doesn't take the same amount of time each time.
python -c "from astroid import __pkginfo__; print(__pkginfo__.version)"
output
2.4.2