Description
Over the years, build.py has become a dumping ground of all things having to do with module dependencies and caching, making it the 4th largest file in mypy. Prompted by #4353 I think it's time to refactor build.py.
One particular idea I'd like to focus on is the distinction between imports, which are determined (almost) purely syntactically by pass one of the semantic analyzer, and have priorities; and dependencies, which include indirect dependencies, and which are associated with the interface hash of the depended-upon module (once available). Dependencies are seeded from the imports, minus missing modules, in the load_graph() phase. They are extended (after type checking of the SCC) with indirect dependencies (computed as always by TypeIndirectionVisitor). Both tables (imports and full dependencies) are then written to the cache metadata, together with a bit representing the presence of errors in this particular module.
A module for which a cache file exists is then considered fresh (no need to process) if all of the following hold:
- The source hash computed from the source matches the source hash in the cache (or the mtime+size matches, which is an acceptable proxy)
- The error bit is off
- For every dependency, the computed interface hash matches the cached interface hash
For SCCs this needs to be tweaked somewhat -- dependencies within the SCC don't count, and the condition must hold for every module in the SCC. (There are other tweaks needed to account for changed options and changes in the "library path".)
One benefit of this algorithm is that we no longer have to depend on linear mtimes for cache data files to compute freshness. Another is that we may be able to skip processing modules even if there are errors upstream, as long as those errors don't affect the interface hash.
Other things to refactor include the "stat cache" that's used by find_module(), logging, and the fact that the constructor of the State class does way too much work.
This is a big refactoring and I expect it will take a few weeks at least. But I think it's time to start this operation. [UPDATE: I won't start until January 2018 at the earliest.]